For the lucky few that will hopefully get to know me over time, you will learn that I’m passionate about Large Language Models (LLMs).
So, I’ve been spending some time learning how I can self-host open source models. Partly because I think the pace at which open source is keeping with the frontier closed source models is astounding and also partly because I needed to find a way to quickly (and cheaply) test model outputs locally.
In this blog post I will explain how to self host an LLM using:
- Ollama
- Docker
- Open WebUI
Why Self Hosting an LLM Matters
Self hosting LLMs offers a bunch of benefits. From the budding developer to large enterprises. By using tools like Ollama, you gain several advantages such as:
- Data Privacy: Avoid sending sensitive or personal information to external servers.
- Cost Savings: Cloud hosted models can get expensive. And while open source models are obviously not the best of the best, if your use case doesn’t need top spec models, you can save a buck or two by running your own – it’s free!
- Offline Availability: There is no requirement for internet access to run a local model. I think this is a pretty cool point. Imagine telling someone even 3 years ago you could have an AI on your own kit needing no internet access.
Installing Ollama
What is Ollama?
For those that don’t yet know, Ollama is “an open-source tool designed to simplify the local deployment and operation of large language models. Actively maintained and regularly updated, it offers a lightweight, easily extensible framework that allows developers to effortlessly build and manage LLMs on their local machines. This eliminates the need for complex configurations or reliance on external servers, making it an ideal choice for various applications.” (How to run Ollama on Windows. Getting Started with Ollama: A… | by Research Graph | Medium).
In short it lets you host LLMs locally.
You can download Ollama for MacOS, Linux & Windows by following this link: Ollama
I will be using the Windows version of Ollama.
Why Ollama?
From the paragraph above you can see that Ollama lets you run models locally – yipee! But….why Ollama?
Well, and this is the cool part, Ollama automatically gets models from the best repo’s and allows you to use GPU acceleration if your computer has a GPU (don’t worry if it doesn’t or if its an older rig – some of the smaller models can run on a CPU)
Another important point is that by running a model locally any and all data the model interacts with never leaves your computer. A bonus point for anyone who’s data aware!
Installation
As mentioned earlier I run the windows version of Ollama. I did manage to get WSL installed and dipped my toes into some Linux stuff but it wasn’t something I needed to use. I needed something easy and fast to install so Windows it was.
To install Ollama simply run the Windows installer you downloaded from the site. Once installed you should see the llama symbol in your system tray:

To double check its running correctly go to localhost:11434 and you should see this:

Running a Model
So you’ve got Ollama installed 🥳 now to the neat part – running a local model. Thankfully doing that via Ollama is easy peasy!
For this guide we will try running one of the most popular open source models LLaMA 3.1. I will be running the 8B version.
Note - You can get the latest list of available models via the Models page here: library (ollama.com)
Follow these steps to run LLaMA 3:
- Open Command Prompt, PowerShell or Windows Terminal
- Enter ‘ollama run llama3.1:8b’
That’s it! Once the model has been pulled you should see the below:

Now you can query the model with any questions you might have. I’m always astounded at the speed at which these LLMs can respond.

If you want to see a list of currently installed models you can run the command ollama list

To end your session simply type /bye.
OPTIONAL: Enable GPU Acceleration
If you have an NVIDIA GPU I highly recommend you enable GPU Acceleration.
For anyone that isn’t yet aware, LLMs perform inference (token generation) significantly faster when using GPUs. I won’t explain why on this blog post so look it up if you want to deep dive it’s pretty cool.
Enabling GPU Acceleration will make how quick the model responds much much faster. To enable it you need to install the CUDA Toolkit from NVIDIA. You can find the link here: Download CUDA toolkit and follow the steps for installation.
Ollama will automatically detect what GPU you are using, if any.
Docker
Installing Docker
Open WebUI needs to be ran inside a Docker container so that’s what we need to set up next.
If you’re new to Docker, no worries! It’s just a fancy way to run programs in isolated environments.
You first need to download Docker Desktop from the Docker site.
Follow the instruction provided by the installer. After installation, Open Docker Desktop to ensure everything was installed smoothly.
Installing Open WebUI to docker container
Now we’ve got Ollama and Docker installed, we need to create an image that contains OpenWeb UI. Thankfully it’s as easy as one command:
docker run -d -p 3000:8080 –add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data –name open-webui –restart always ghcr.io/open-webui/open-webui:main
Paste that command into your terminal and you should see a new image running on Docker Desktop

Open WebUI
What is Open WebUI
Oh am I glad you asked!
Open WebUI is an awesome front end for self hosting LLMs. There’s a boat load of features built into this open source project. Here’s just some:
- Video Calling: Enable video calls with supported vision models.
- Model Builder: Easily create Ollama models directly in Open WebUI.
- Local RAG: Easily implement RAG on your own PC.
- Web Search: Let your local models search the web to find better answers for you.
- Python Code Execution: Execute python code directly within the UI.
- @Model Calling: In the same chat you can @ any loaded model to pick and choose the best features as and when you need them e.g. calling a vision capable model to explain an image or extract text from an image.
The list of what Open WebUI can do is extensive and I highly recommend looking at the full list on their site. My little list above isn’t doing this project justice ⭐ Features | Open WebUI
Running Open WebUI
So we’ve got Ollama installed, Docker Desktop installed and an image of Open WebUI up and running. What now?
To get hands on all you have to do, in Docker Desktop, is click the link underneath Port(s)

This will open up Open WebUI on port 3000.
You will be met with a sign-in page. To date I’m not actually sure why this is needed. It doesn’t look like the data actually goes anywhere (otherwise what’s the point in all this being locally ran?). So for now just create an account but don’t worry about having to accept any emails to get access.
AND WE’RE IN!
Once you’ve signed up you will be met with the landing page of Open WebUI. From here you can choose which model you interact with:

For anyone familiar with the ChatGPT UI this should make navigating Open WebUI relatively easy.
One of the first things I done when I installed the application is enable model memory and tell it my name. I just think having an AI call you by name is really cool. The best part about this feature is it applies to any model you load.
If you want the same follow these steps:
- Click your username then settings:

2. Go to Personalisation and enable memory

3. Click Manage and tell the model your name

And here’s the output when I said Hi to Llama 3.1

Conclusion
In this guide I’ve walked you through the 3 steps necessary to run a local LLM.
I would recommend checking out the official guidance or some YouTube videos if you want to delve deeper into what Open WebUI is capable of as I can’t explain all the features in one post.
Hopefully you can now see how simple it is to get up and running with your own personal AI.
Have fun!

Leave a comment