Overview
If you want a fast, private, and low-cost way to chat with large language models on your own machine, pairing Ollama with Open WebUI inside Docker is a great setup. Ollama handles model downloads and inference (CPU or NVIDIA GPU), while Open WebUI provides a clean, modern chat interface in your browser. This guide shows how to deploy both with Docker Compose on Ubuntu 22.04/24.04 and enable GPU acceleration for significant speedups.
By the end, you will have a persistent, self-hosted AI chat running at http://localhost:3000, with models managed by Ollama at http://localhost:11434. The instructions also include CPU-only notes, backup tips, and troubleshooting for common pitfalls.
Prerequisites
- Ubuntu 22.04 or 24.04 with sudo access. Windows and macOS work with Docker too, but this tutorial focuses on Ubuntu.
- For GPU acceleration: an NVIDIA GPU with a recent driver (typically 525+). CPU-only also works, just slower.
- Docker Engine and Docker Compose plugin (we will install them below).
- At least 16 GB RAM for 7B–8B models; more is better for larger models.
- Open ports: 11434 (Ollama API) and 3000 (Open WebUI).
Step 1 — Install Docker Engine and Compose
Commands:sudo apt-get updatesudo apt-get install -y ca-certificates curl gnupgsudo install -m 0755 -d /etc/apt/keyringscurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpgecho "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt-get updatesudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginsudo usermod -aG docker $USER && newgrp docker
Verify Docker works: docker run --rm hello-world. Verify Compose works: docker compose version.
Step 2 — Install NVIDIA Driver (GPU users)
If you already have a recent NVIDIA driver, you can skip this step. Otherwise, install the recommended driver and reboot:
sudo ubuntu-drivers autoinstallsudo reboot
After reboot, confirm the GPU is visible: nvidia-smi. You should see your GPU model and driver version.
Step 3 — Enable GPU inside Docker
Install the NVIDIA Container Toolkit so Docker containers can access your GPU:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpgcurl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
Configure Docker to use the NVIDIA runtime by default:
sudo mkdir -p /etc/dockercat <<'EOF' | sudo tee /etc/docker/daemon.json{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }}EOFsudo systemctl restart docker
Test GPU access in containers: docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi. If you see the usual output, you are set.
CPU-only? Skip Step 3 and the GPU test. The rest works the same, just remove the NVIDIA-specific line from the compose file noted below.
Step 4 — Create a Docker Compose file
Create a new folder and the compose file:
mkdir -p ~/ai-stack && cd ~/ai-stacknano docker-compose.yml
Paste the following content, then save:
version: "3.9"services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped ports: - "11434:11434" environment: - OLLAMA_KEEP_ALIVE=30m volumes: - ollama:/root/.ollama runtime: nvidia open-webui: image: ghcr.io/open-webui/open-webui:main container_name: open-webui restart: unless-stopped depends_on: - ollama environment: - OLLAMA_API_BASE_URL=http://ollama:11434 ports: - "3000:8080" volumes: - openwebui:/app/backend/datavolumes: ollama: openwebui:
Note: If you are running CPU-only, delete the line runtime: nvidia and keep everything else.
Step 5 — Launch the stack and pull a model
docker compose up -d
Wait a few seconds and confirm both containers are healthy: docker ps. Next, pull a model into Ollama. Good starters are llama3.1:8b, mistral:7b, or a small qwen2:7b.
docker exec -it ollama ollama pull llama3.1:8b
List installed models with: docker exec -it ollama ollama list.
Step 6 — Chat in Open WebUI
Open http://localhost:3000 in your browser. Open WebUI should auto-detect Ollama via the environment variable, but you can also set the API in Settings > Connections to http://ollama:11434 (inside Docker) or http://localhost:11434 (host access). Create a new chat, choose your model (for example, llama3.1:8b), and start chatting locally.
Backups, Updates, and Performance Tips
Persistence: Your models and chats are stored in the named volumes ollama and openwebui. Back them up with docker run --rm -v ollama:/data -v $(pwd):/backup alpine tar czf /backup/ollama.tgz -C / data (and similarly for openwebui).
Updates: Pull fresh images and recreate: docker compose pull && docker compose up -d. Ollama keeps your models; no need to re-download.
Performance: Prefer GPU for best speed. If RAM/VRAM is tight, choose smaller or quantized models (e.g., llama3.1:8b-q4_K_M). Set OLLAMA_KEEP_ALIVE to keep models warm between requests.
Remote access: If exposing over the internet, place Open WebUI behind a reverse proxy (Nginx, Caddy, or Traefik) and enable authentication and TLS. Never expose Ollama directly without controls.
Troubleshooting
GPU not detected: Check nvidia-smi works on the host. Then run docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi. If that fails, revisit Step 3 and confirm /etc/docker/daemon.json is correct and Docker was restarted.
Open WebUI cannot reach Ollama: Ensure both containers are up. Verify docker logs open-webui and confirm OLLAMA_API_BASE_URL is http://ollama:11434. From the host, curl http://localhost:11434/api/tags should list installed models.
Downloads are slow: Models can be several GB. Use a wired connection or pre-fetch models off-peak. You can also copy existing models into the ollama volume if you have them from another machine.
Port conflicts: If ports 11434 or 3000 are in use, change them in the compose file (left side of the colon) and recreate the stack.
What You Achieved
You now have a self-hosted AI chat stack running locally with Docker. Ollama manages lightweight, high-quality models, and Open WebUI provides a comfortable chat experience. With GPU acceleration, responses are significantly faster, and your data never leaves your machine. Extend this setup with a reverse proxy, add more models, or integrate the Ollama API into your own apps for a powerful private AI workstation.
Comments
Post a Comment