How to Run a Local AI Chat with Ollama and Open WebUI in Docker (GPU Ready)

Overview

If you want a fast, private, and low-cost way to chat with large language models on your own machine, pairing Ollama with Open WebUI inside Docker is a great setup. Ollama handles model downloads and inference (CPU or NVIDIA GPU), while Open WebUI provides a clean, modern chat interface in your browser. This guide shows how to deploy both with Docker Compose on Ubuntu 22.04/24.04 and enable GPU acceleration for significant speedups.

By the end, you will have a persistent, self-hosted AI chat running at http://localhost:3000, with models managed by Ollama at http://localhost:11434. The instructions also include CPU-only notes, backup tips, and troubleshooting for common pitfalls.

Prerequisites

- Ubuntu 22.04 or 24.04 with sudo access. Windows and macOS work with Docker too, but this tutorial focuses on Ubuntu.
- For GPU acceleration: an NVIDIA GPU with a recent driver (typically 525+). CPU-only also works, just slower.
- Docker Engine and Docker Compose plugin (we will install them below).
- At least 16 GB RAM for 7B–8B models; more is better for larger models.
- Open ports: 11434 (Ollama API) and 3000 (Open WebUI).

Step 1 — Install Docker Engine and Compose

Commands:
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER && newgrp docker

Verify Docker works: docker run --rm hello-world. Verify Compose works: docker compose version.

Step 2 — Install NVIDIA Driver (GPU users)

If you already have a recent NVIDIA driver, you can skip this step. Otherwise, install the recommended driver and reboot:

sudo ubuntu-drivers autoinstall
sudo reboot

After reboot, confirm the GPU is visible: nvidia-smi. You should see your GPU model and driver version.

Step 3 — Enable GPU inside Docker

Install the NVIDIA Container Toolkit so Docker containers can access your GPU:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

Configure Docker to use the NVIDIA runtime by default:

sudo mkdir -p /etc/docker
cat <<'EOF' | sudo tee /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
sudo systemctl restart docker

Test GPU access in containers: docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi. If you see the usual output, you are set.

CPU-only? Skip Step 3 and the GPU test. The rest works the same, just remove the NVIDIA-specific line from the compose file noted below.

Step 4 — Create a Docker Compose file

Create a new folder and the compose file:

mkdir -p ~/ai-stack && cd ~/ai-stack
nano docker-compose.yml

Paste the following content, then save:

version: "3.9"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
environment:
- OLLAMA_KEEP_ALIVE=30m
volumes:
- ollama:/root/.ollama
runtime: nvidia

open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
depends_on:
- ollama
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434
ports:
- "3000:8080"
volumes:
- openwebui:/app/backend/data

volumes:
ollama:
openwebui:

Note: If you are running CPU-only, delete the line runtime: nvidia and keep everything else.

Step 5 — Launch the stack and pull a model

docker compose up -d

Wait a few seconds and confirm both containers are healthy: docker ps. Next, pull a model into Ollama. Good starters are llama3.1:8b, mistral:7b, or a small qwen2:7b.

docker exec -it ollama ollama pull llama3.1:8b

List installed models with: docker exec -it ollama ollama list.

Step 6 — Chat in Open WebUI

Open http://localhost:3000 in your browser. Open WebUI should auto-detect Ollama via the environment variable, but you can also set the API in Settings > Connections to http://ollama:11434 (inside Docker) or http://localhost:11434 (host access). Create a new chat, choose your model (for example, llama3.1:8b), and start chatting locally.

Backups, Updates, and Performance Tips

Persistence: Your models and chats are stored in the named volumes ollama and openwebui. Back them up with docker run --rm -v ollama:/data -v $(pwd):/backup alpine tar czf /backup/ollama.tgz -C / data (and similarly for openwebui).

Updates: Pull fresh images and recreate: docker compose pull && docker compose up -d. Ollama keeps your models; no need to re-download.

Performance: Prefer GPU for best speed. If RAM/VRAM is tight, choose smaller or quantized models (e.g., llama3.1:8b-q4_K_M). Set OLLAMA_KEEP_ALIVE to keep models warm between requests.

Remote access: If exposing over the internet, place Open WebUI behind a reverse proxy (Nginx, Caddy, or Traefik) and enable authentication and TLS. Never expose Ollama directly without controls.

Troubleshooting

GPU not detected: Check nvidia-smi works on the host. Then run docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi. If that fails, revisit Step 3 and confirm /etc/docker/daemon.json is correct and Docker was restarted.

Open WebUI cannot reach Ollama: Ensure both containers are up. Verify docker logs open-webui and confirm OLLAMA_API_BASE_URL is http://ollama:11434. From the host, curl http://localhost:11434/api/tags should list installed models.

Downloads are slow: Models can be several GB. Use a wired connection or pre-fetch models off-peak. You can also copy existing models into the ollama volume if you have them from another machine.

Port conflicts: If ports 11434 or 3000 are in use, change them in the compose file (left side of the colon) and recreate the stack.

What You Achieved

You now have a self-hosted AI chat stack running locally with Docker. Ollama manages lightweight, high-quality models, and Open WebUI provides a comfortable chat experience. With GPU acceleration, responses are significantly faster, and your data never leaves your machine. Extend this setup with a reverse proxy, add more models, or integrate the Ollama API into your own apps for a powerful private AI workstation.

Comments