Overview
This guide shows you how to self-host Ollama and Open WebUI on Ubuntu using Docker Compose with NVIDIA GPU acceleration. Ollama makes it easy to run popular local LLMs (like Llama 3, Mistral, Phi, and Qwen), while Open WebUI provides a clean, multi-user chat interface, prompt management, and model switching. By the end, you will have a persistent, GPU-enabled AI stack reachable in your browser, suitable for personal use or a small team.
Prerequisites
- Ubuntu 22.04 or 24.04 (server or desktop), 16 GB RAM recommended.
- An NVIDIA GPU with recent drivers (8 GB VRAM or more recommended for 7B/8B models).
- Docker Engine and the Docker Compose plugin.
- A user with sudo privileges and outbound internet access.
Step 1 — Install Docker and Docker Compose
If Docker is not installed, run:
sudo apt update && sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Verify Docker works: docker version and docker compose version.
Step 2 — Enable NVIDIA GPU in Containers
Install the NVIDIA Container Toolkit so Docker can access your GPU. First, ensure the NVIDIA driver is installed and nvidia-smi works on the host. Then run:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Test inside a container: docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi. You should see your GPU listed.
Step 3 — Create the Docker Compose file
Make a new folder for the stack and create compose.yml in it:
mkdir -p ~/ai-stack && cd ~/ai-stack
Use this minimal Compose configuration (Ollama + Open WebUI, GPU-enabled, with persistent volumes):
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=24h
gpus: all
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
depends_on:
- ollama
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=True
volumes:
- openwebui:/app/backend/data
restart: unless-stopped
volumes:
ollama:
openwebui:
This setup exposes Ollama on port 11434 (API) and Open WebUI on 3000 (web). Data persists in Docker volumes, so updates do not erase models or chats.
Step 4 — Launch the stack and pull a model
Start both services in the background:
docker compose up -d
Check logs to confirm GPU access and healthy startup:
docker logs -f ollama and docker logs -f open-webui
Pull your first model (example: Llama 3.1 8B) and verify inference:
docker exec -it ollama ollama pull llama3.1:8b
docker exec -it ollama ollama run llama3.1:8b
Open a browser to http://<your_server_ip>:3000, create your admin account, choose the pulled model, and start chatting.
Step 5 — Secure access and basic hardening
Open WebUI has built-in auth. The Compose file sets WEBUI_AUTH=True, which prompts for signup on first visit. After creating the admin user, disable new registrations by adding ENABLE_SIGNUP=False under the open-webui environment and redeploy with docker compose up -d.
If you will expose the UI on the internet, place it behind a reverse proxy with HTTPS. For example, with Caddy on the same host, you can proxy to port 3000 and get automatic TLS:
my-ai.example.com {
reverse_proxy 127.0.0.1:3000
}
Alternatively, use Nginx and a free TLS certificate from Let's Encrypt. Restrict access with IP allowlists or SSO if available.
Step 6 — Useful environment options
- OLLAMA_KEEP_ALIVE: Keeps models warm for faster first-token latency (e.g., 24h).
- WEBUI_AUTH and ENABLE_SIGNUP: Enable auth and control who can create accounts.
- OLLAMA_NUM_PARALLEL: Limit concurrent requests to protect VRAM.
- OPENAI_API_BASE_URL (Open WebUI): Point tools or plugins to Ollama if needed for compatibility layers.
Step 7 — Backup and update strategy
Your chats and models live in Docker volumes (ollama and openwebui). To back them up quickly, stop the stack and archive the volumes:
docker compose down
docker run --rm -v ollama:/data -v $(pwd):/backup busybox tar czf /backup/ollama-vol.tar.gz -C /data .
docker run --rm -v openwebui:/data -v $(pwd):/backup busybox tar czf /backup/openwebui-vol.tar.gz -C /data .
To update, pull the latest images and redeploy:
docker compose pull && docker compose up -d
Troubleshooting
- No GPU visible in containers: confirm nvidia-smi works on the host, that the NVIDIA Container Toolkit is installed, and that gpus: all is present under the Ollama service.
- Port in use: change 11434 or 3000 in compose.yml if conflicts arise.
- Out of memory (VRAM): try a smaller model variant (e.g., 7B/8B quantized like Q4_K_M), or reduce parallel requests. Example pull: ollama pull llama3.1:8b-instruct-q4_K_M.
- Slow first response: increase OLLAMA_KEEP_ALIVE or keep frequently used models loaded.
What you built
You now have a modern, GPU-accelerated local AI stack with Ollama and Open WebUI running on Docker Compose. It is easy to manage, fast to update, and simple to secure behind HTTPS. Add more models, enable extensions, or integrate with automation tools to turn this into a private, production-ready assistant for your workstation or team.
3.
Comments
Post a Comment