For the lucky few that will hopefully get to know me over time, you will learn that I’m passionate about Large Language Models (LLMs).

So, I’ve been spending some time learning how I can self-host open source models. Partly because I think the pace at which open source is keeping with the frontier closed source models is astounding and also partly because I needed to find a way to quickly (and cheaply) test model outputs locally.

In this blog post I will explain how to self host an LLM using:

Ollama
Docker
Open WebUI

Why Self Hosting an LLM Matters

Self hosting LLMs offers a bunch of benefits. From the budding developer to large enterprises. By using tools like Ollama, you gain several advantages such as:

Data Privacy: Avoid sending sensitive or personal information to external servers.
Cost Savings: Cloud hosted models can get expensive. And while open source models are obviously not the best of the best, if your use case doesn’t need top spec models, you can save a buck or two by running your own – it’s free!
Offline Availability: There is no requirement for internet access to run a local model. I think this is a pretty cool point. Imagine telling someone even 3 years ago you could have an AI on your own kit needing no internet access.

Installing Ollama

What is Ollama?

For those that don’t yet know, Ollama is “an open-source tool designed to simplify the local deployment and operation of large language models. Actively maintained and regularly updated, it offers a lightweight, easily extensible framework that allows developers to effortlessly build and manage LLMs on their local machines. This eliminates the need for complex configurations or reliance on external servers, making it an ideal choice for various applications.” (How to run Ollama on Windows. Getting Started with Ollama: A… | by Research Graph | Medium).

In short it lets you host LLMs locally.

You can download Ollama for MacOS, Linux & Windows by following this link: Ollama

I will be using the Windows version of Ollama.

Why Ollama?

From the paragraph above you can see that Ollama lets you run models locally – yipee! But….why Ollama?

Well, and this is the cool part, Ollama automatically gets models from the best repo’s and allows you to use GPU acceleration if your computer has a GPU (don’t worry if it doesn’t or if its an older rig – some of the smaller models can run on a CPU)

Another important point is that by running a model locally any and all data the model interacts with never leaves your computer. A bonus point for anyone who’s data aware!

Installation

As mentioned earlier I run the windows version of Ollama. I did manage to get WSL installed and dipped my toes into some Linux stuff but it wasn’t something I needed to use. I needed something easy and fast to install so Windows it was.

To install Ollama simply run the Windows installer you downloaded from the site. Once installed you should see the llama symbol in your system tray:

To double check its running correctly go to localhost:11434 and you should see this:

Running a Model

So you’ve got Ollama installed 🥳 now to the neat part – running a local model. Thankfully doing that via Ollama is easy peasy!

For this guide we will try running one of the most popular open source models LLaMA 3.1. I will be running the 8B version.

Note - You can get the latest list of available models via the Models page here: library (ollama.com)

Follow these steps to run LLaMA 3:

Open Command Prompt, PowerShell or Windows Terminal
Enter ‘ollama run llama3.1:8b’

That’s it! Once the model has been pulled you should see the below:

Now you can query the model with any questions you might have. I’m always astounded at the speed at which these LLMs can respond.

If you want to see a list of currently installed models you can run the command ollama list

To end your session simply type /bye.

OPTIONAL: Enable GPU Acceleration

If you have an NVIDIA GPU I highly recommend you enable GPU Acceleration.

For anyone that isn’t yet aware, LLMs perform inference (token generation) significantly faster when using GPUs. I won’t explain why on this blog post so look it up if you want to deep dive it’s pretty cool.

Enabling GPU Acceleration will make how quick the model responds much much faster. To enable it you need to install the CUDA Toolkit from NVIDIA. You can find the link here: Download CUDA toolkit and follow the steps for installation.

Ollama will automatically detect what GPU you are using, if any.

Docker

Installing Docker

Open WebUI needs to be ran inside a Docker container so that’s what we need to set up next.

If you’re new to Docker, no worries! It’s just a fancy way to run programs in isolated environments.

You first need to download Docker Desktop from the Docker site.

Follow the instruction provided by the installer. After installation, Open Docker Desktop to ensure everything was installed smoothly.

Installing Open WebUI to docker container

Now we’ve got Ollama and Docker installed, we need to create an image that contains OpenWeb UI. Thankfully it’s as easy as one command:

docker run -d -p 3000:8080 –add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data –name open-webui –restart always ghcr.io/open-webui/open-webui:main

Paste that command into your terminal and you should see a new image running on Docker Desktop

Open WebUI

What is Open WebUI

Oh am I glad you asked!

Open WebUI is an awesome front end for self hosting LLMs. There’s a boat load of features built into this open source project. Here’s just some:

Video Calling: Enable video calls with supported vision models.
Model Builder: Easily create Ollama models directly in Open WebUI.
Local RAG: Easily implement RAG on your own PC.
Web Search: Let your local models search the web to find better answers for you.
Python Code Execution: Execute python code directly within the UI.
@Model Calling: In the same chat you can @ any loaded model to pick and choose the best features as and when you need them e.g. calling a vision capable model to explain an image or extract text from an image.

The list of what Open WebUI can do is extensive and I highly recommend looking at the full list on their site. My little list above isn’t doing this project justice ⭐ Features | Open WebUI

Running Open WebUI

So we’ve got Ollama installed, Docker Desktop installed and an image of Open WebUI up and running. What now?

To get hands on all you have to do, in Docker Desktop, is click the link underneath Port(s)

This will open up Open WebUI on port 3000.

You will be met with a sign-in page. To date I’m not actually sure why this is needed. It doesn’t look like the data actually goes anywhere (otherwise what’s the point in all this being locally ran?). So for now just create an account but don’t worry about having to accept any emails to get access.

AND WE’RE IN!

Once you’ve signed up you will be met with the landing page of Open WebUI. From here you can choose which model you interact with:

For anyone familiar with the ChatGPT UI this should make navigating Open WebUI relatively easy.

One of the first things I done when I installed the application is enable model memory and tell it my name. I just think having an AI call you by name is really cool. The best part about this feature is it applies to any model you load.

If you want the same follow these steps:

Click your username then settings:

2. Go to Personalisation and enable memory

3. Click Manage and tell the model your name

And here’s the output when I said Hi to Llama 3.1

Conclusion

In this guide I’ve walked you through the 3 steps necessary to run a local LLM.

I would recommend checking out the official guidance or some YouTube videos if you want to delve deeper into what Open WebUI is capable of as I can’t explain all the features in one post.

Hopefully you can now see how simple it is to get up and running with your own personal AI.

Have fun!

Comments

2 responses to “Self Hosting LLMs via Ollama, Docker, & Open WebUI”

2nd Feb 2025

Building a Self-Hosted PDF Chat App with Deepseek-r1: Exploring RAG, Embeddings, and Vector Databases – Liam Patience’s Blog

[…] I will be using Ollama to self-host a distilled version of the recently released Deepseek-R1, so if you want to follow along you can check out my post on how to host your own model. […]

LikeLike

30th Sep 2025

Mallik Achyuta

Hello Liam. I recently came across a blog post on Viva Engage at my workplace, and it perfectly aligns with what I’ve been searching for over the past few months to help build a chatbot on my local PC.

LikeLike

Liam Patience's Blog

Self Hosting LLMs via Ollama, Docker, & Open WebUI

Why Self Hosting an LLM Matters

Installing Ollama

What is Ollama?

Why Ollama?

Installation

Running a Model

Docker

Installing Docker

Installing Open WebUI to docker container

Open WebUI

What is Open WebUI

Running Open WebUI

Conclusion

Comments

2 responses to “Self Hosting LLMs via Ollama, Docker, & Open WebUI”

Leave a comment Cancel reply

More posts

What we learned in 60 days: Practical lessons for testing GenAI systems in the real world

Building a Self-Hosted PDF Chat App with Deepseek-r1: Exploring RAG, Embeddings, and Vector Databases

OpenAI DevDay London 2024: Sam Altman Ask Me Anything

Self Hosting LLMs via Ollama, Docker, & Open WebUI

Self Hosting LLMs via Ollama, Docker, & Open WebUI

Why Self Hosting an LLM Matters

Installing Ollama

What is Ollama?

Why Ollama?

Installation

Running a Model

Docker

Installing Docker

Installing Open WebUI to docker container

Open WebUI

What is Open WebUI

Running Open WebUI

Conclusion

Share this:

Comments

2 responses to “Self Hosting LLMs via Ollama, Docker, & Open WebUI”

Leave a comment Cancel reply

More posts

What we learned in 60 days: Practical lessons for testing GenAI systems in the real world

Building a Self-Hosted PDF Chat App with Deepseek-r1: Exploring RAG, Embeddings, and Vector Databases

OpenAI DevDay London 2024: Sam Altman Ask Me Anything

Self Hosting LLMs via Ollama, Docker, & Open WebUI