How to Run Local AI with Ollama and Open WebUI on Docker (GPU Ready)

Overview

This step-by-step guide shows you how to deploy a private, local AI stack with Ollama (for running models) and Open WebUI (for a friendly chat interface) using Docker. You will be able to run modern language models entirely on your machine, optionally using your NVIDIA GPU for acceleration. The result is fast, offline, and secure—ideal for developers, IT teams, and privacy-focused users.

Prerequisites

- A Linux host (Ubuntu 22.04/24.04 or similar). It also works on Windows/macOS with Docker Desktop.
- Docker Engine 24+ and Docker Compose plugin.
- Open ports: 11434 (Ollama) and 3000 (Open WebUI).
- Optional GPU: recent NVIDIA driver and toolkit (CUDA-capable GPU, driver 535+ recommended).

Step 1 — Install Docker (and enable non-root use)

On Ubuntu, the fastest way is the official convenience script. Run: curl -fsSL https://get.docker.com | sh

Add your user to the docker group so you can run commands without sudo: sudo usermod -aG docker $USER then newgrp docker (or log out and back in).

Step 2 — Optional: Enable NVIDIA GPU for containers

Install the NVIDIA Container Toolkit so Docker can access your GPU: sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

Configure it and restart Docker: sudo nvidia-ctk runtime configure --runtime=docker then sudo systemctl restart docker. Verify the GPU is visible: docker run --rm --gpus all nvidia/cuda:12.4.0-base nvidia-smi

Step 3 — Create a docker-compose.yml

In an empty folder (for example, ~/ai-stack), create a file named docker-compose.yml with the following content:

services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0
restart: unless-stopped

open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- openwebui:/app/backend/data
depends_on:
- ollama
restart: unless-stopped

volumes:
ollama:
openwebui:

GPU acceleration (optional): if you installed the NVIDIA toolkit, add one line under the ollama service: gpus: all. For example:
ollama:
  image: ollama/ollama:latest
  gpus: all
  ...

Step 4 — Start the stack

From the same folder, run: docker compose up -d. Docker will pull images and start two containers: ollama and open-webui.

Step 5 — Pull a model in Ollama

You can pull models on demand. A good starting point is Meta’s Llama 3.2 3B (fast, small): docker exec -it ollama ollama pull llama3.2:3b

Other popular choices: phi3:mini, mistral, or qwen2.5:7b. List downloaded models with: docker exec -it ollama ollama list

Step 6 — Open the Web UI

Visit http://YOUR_SERVER_IP:3000. The first time, create an admin account. Open WebUI should auto-detect Ollama at http://ollama:11434. If not, set it under Settings → Connections → Ollama.

Start a new chat, pick the model you pulled (for example, llama3.2:3b), and send your first prompt. If you enabled GPU, generation speed should be noticeably higher.

Security and networking tips

- Keep Ollama port 11434 private. If you expose the stack to the internet, front it with a reverse proxy (Caddy, Nginx, Traefik) and enable HTTPS and authentication at the proxy layer.
- In Open WebUI, create users with least privilege and enable access controls if multiple people will connect.
- Restrict your firewall to allow only required IPs to port 3000.

Updating the stack

To update images with minimal downtime, run: docker compose pull then docker compose up -d. Your models and WebUI data live in Docker volumes and persist across updates.

Backup and restore

Backup volumes to a tarball: docker run --rm -v ollama:/data -v $PWD:/backup busybox tar czf /backup/ollama-vol.tgz -C / data and docker run --rm -v openwebui:/data -v $PWD:/backup busybox tar czf /backup/openwebui-vol.tgz -C / data

To restore, create the volumes (start the stack once), stop it, then extract the tarballs back to each volume using the same pattern.

Troubleshooting

- Permission denied on Docker: ensure your user is in the docker group (id should show docker), or prefix with sudo.
- GPU not detected: confirm nvidia-smi works on the host and that docker run --rm --gpus all nvidia/cuda:12.4.0-base nvidia-smi returns your GPU. Verify gpus: all is present in the Ollama service and restart with docker compose up -d.
- Port conflict: change 11434 or 3000 host ports in the compose file if those are already in use.
- Storage usage: models can be large. Remove models with docker exec -it ollama ollama rm MODEL_NAME and clean unused images with docker image prune.

What’s next

Add embeddings and RAG by enabling Open WebUI’s knowledge features, run multiple models in parallel, or place the stack behind a reverse proxy with a domain and TLS. With Ollama and Open WebUI on Docker, you have a fast, private, and flexible local AI platform you control.

Comments