Overview
This step-by-step guide shows how to self-host Open WebUI with Ollama on Ubuntu 24.04 using Docker and persistent volumes. You will get a browser-based chat UI that talks to local large language models (LLMs), with optional NVIDIA GPU acceleration for faster inference. The setup is repeatable, easy to update, and suitable for lab, workstation, or homelab deployments.
What You’ll Build
You will deploy two containers with Docker Compose: ollama (the LLM runtime and model manager) and openwebui (the web interface). We will bind ports, persist models and settings in volumes, and optionally enable GPU. By the end, you will be able to chat with models such as llama3.1:8b directly from your browser at http://SERVER_IP:3000.
Prerequisites
- Ubuntu 22.04 or 24.04 (64-bit), a user with sudo, and a stable internet connection.
- At least 16 GB RAM recommended for 7B–8B class models; more for larger models.
- Optional NVIDIA GPU (Turing or newer) for acceleration.
- Open TCP ports 3000 (Open WebUI) and 11434 (Ollama) if accessing from other devices.
1) Optional: Install NVIDIA Drivers and Container Toolkit
Skip this section if you will run on CPU only. For GPU acceleration, install the NVIDIA driver and the NVIDIA Container Toolkit so Docker can pass the GPU to containers.
sudo apt update
sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers autoinstall
sudo reboot
After the reboot, install the container toolkit and configure Docker to use it:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
2) Install Docker Engine and Compose
Install Docker using the convenience script, then add your user to the docker group. Log out/in or run newgrp to apply the group change immediately.
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version
3) Create the Docker Compose File
Create a project folder and a minimal Compose file that brings up Ollama and Open WebUI. The default snippet runs on CPU; GPU instructions are shown below.
mkdir -p ~/openwebui-ollama && cd ~/openwebui-ollama
nano docker-compose.yml
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=24h
# GPU (NVIDIA) - uncomment the three lines below if you have a supported GPU:
# runtime: nvidia
# environment:
# - NVIDIA_VISIBLE_DEVICES=all
openwebui:
image: ghcr.io/open-webui/open-webui:latest
container_name: openwebui
restart: unless-stopped
depends_on:
- ollama
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- openwebui:/app/backend/data
volumes:
ollama:
openwebui:
Note: If your Docker setup prefers Compose's newer GPU syntax, you can replace the Ollama GPU block above with the following under the ollama service:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
The "deploy" section is primarily for Swarm, but recent Docker Compose releases honor it on many setups. If it does not work, use the runtime: nvidia method instead.
4) Start the Stack
Bring the services online in the background and watch logs for a minute:
docker compose up -d
docker compose ps
docker logs -f ollama
5) Pull a Model
Use Ollama to download a model into the persistent volume. Start with an 8B class model for a good balance of quality and resource usage:
docker exec -it ollama ollama pull llama3.1:8b
# or another model:
# docker exec -it ollama ollama pull qwen2.5:7b-instruct
Once pulled, browse to http://SERVER_IP:3000. In Open WebUI, select the model in the dropdown before chatting. You can manage prompts, history, and settings from the UI.
6) Verify GPU Acceleration (Optional)
Confirm that the container sees your GPU and that inference uses it. If you enabled the GPU block and installed the toolkit, both commands should work:
docker exec -it ollama nvidia-smi
docker exec -it ollama bash -lc 'ollama run llama3.1:8b "What is the speed of light?"'
If the first command fails, re-check your driver, toolkit, and Docker runtime configuration. On CPU-only systems, skip this step.
7) Backups, Updates, and Maintenance
- Backup: the ollama volume holds your models; openwebui holds settings and history. You can back up volumes with a simple tar job:
docker run --rm -v ollama:/data -v "$PWD":/backup alpine \
tar czf /backup/ollama-models.tgz -C / data
docker run --rm -v openwebui:/data -v "$PWD":/backup alpine \
tar czf /backup/openwebui-data.tgz -C / data
- Update: pull new images and recreate containers without losing data:
docker compose pull
docker compose up -d
- Cleanup: remove unused layers and stopped containers periodically:
docker system prune -f
8) Troubleshooting Tips
Open WebUI cannot reach Ollama: Ensure OLLAMA_BASE_URL is set to http://ollama:11434 and that both services share the same compose project network (default). Restart with docker compose up -d.
GPU not detected: Confirm nvidia-smi works on the host, the NVIDIA Container Toolkit is installed, and your compose file uses either runtime: nvidia or the deploy.devices syntax. Restart Docker after configuration changes.
Out of memory or slow inference: Choose a smaller quantized model (e.g., q4 variants), increase swap on low-RAM systems, or upgrade GPU VRAM. Pulling a different tag is as simple as docker exec -it ollama ollama pull llama3.1:8b-instruct-q4_0.
Port conflicts: Change the left side of the port mappings (e.g., "11435:11434") and update your firewall or reverse proxy rules accordingly.
Security Notes
By default, these services are reachable from your network. For internet access, place them behind a reverse proxy with TLS (Nginx, Caddy, or Traefik), restrict source IPs, or expose via a secure tunnel. Avoid exposing Ollama’s port directly to the public internet.
Wrap-up
You now have a modern, local-first AI chat stack running on Docker with persistent storage and optional GPU acceleration. Add more models with ollama pull, keep images updated with docker compose pull, and back up volumes regularly. This setup scales from a developer laptop to a powerful workstation while keeping your data on your own hardware.
Comments
Post a Comment