Deploy Ollama and Open WebUI on Ubuntu with NVIDIA GPU Using Docker Compose

Overview

This guide shows how to deploy Ollama (for running local LLMs) together with Open WebUI (a clean ChatGPT-like interface) on Ubuntu 22.04/24.04 using Docker Compose and an NVIDIA GPU. You will install Docker, enable GPU acceleration with the NVIDIA Container Toolkit, run both services, pull a model, and fix common errors. If you do not have a GPU, a CPU-only note is included.

Prerequisites

- Ubuntu 22.04 or 24.04 with sudo access.

- An NVIDIA GPU (Turing or newer recommended) with recent drivers (535+ works well) and at least 8 GB VRAM for medium models.

- Internet connectivity and ports 11434 (Ollama) and 3000 (Open WebUI) available.

Step 1: Verify and Install NVIDIA Drivers

Ensure a recent NVIDIA driver is installed and visible to the system. Check with: nvidia-smi. If it shows driver and GPU details, continue. If not, install a recommended driver and reboot:

sudo ubuntu-drivers install
sudo reboot

Step 2: Install Docker Engine and Compose Plugin

Set up the official Docker repository and install Docker plus the Compose plugin:

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Step 3: Install NVIDIA Container Toolkit for Docker

This toolkit exposes your GPU to containers via Docker. Install and restart Docker:

distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Test that containers can see the GPU:

docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi

Step 4: Create a Docker Compose File

Create a project directory, then a compose file:

mkdir -p ~/ollama-openwebui && cd ~/ollama-openwebui
nano docker-compose.yml

Paste the following content. This maps GPU to Ollama, persists data, and links the UI to the API.

services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=6h
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"]
# If your Docker Compose supports it, prefer: gpus: all
# gpus: all

open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- ENABLE_SIGNUP=false
volumes:
- openwebui:/app/backend/data
depends_on:
- ollama

volumes:
ollama:
openwebui:

Note: If your Compose version errors on deploy.resources..., upgrade Docker Compose and use gpus: all under the ollama service instead.

Step 5: Start the Stack and Pull a Model

Launch both containers:

docker compose up -d

Pull a model into Ollama (example: Llama 3.1 8B). You can pull from the host or exec into the container:

docker exec -it ollama ollama pull llama3.1:8b

Open your browser at http://<server-ip>:3000. Create an admin account on first run (if sign-up is disabled, enable it temporarily or set credentials via UI). Choose the model you pulled and start chatting.

Optional: CPU-Only Mode

If you do not have a supported GPU, remove the GPU settings and add OLLAMA_NO_GPU=1 to the ollama environment. Performance will be slower, so consider smaller models like llama3.1:8b-instruct or mistral.

Security, Updates, and Backups

- Network access: Do not expose port 11434 to the internet. Only expose 3000 (the UI) behind a reverse proxy like Nginx, Traefik, or Caddy with HTTPS.

- Authentication: Open WebUI supports local accounts. Disable public sign-ups by keeping ENABLE_SIGNUP=false and add users manually via the admin panel.

- Updates: Pull new images and recreate containers: docker compose pull && docker compose up -d.

- Backups: Save volumes with docker run --rm -v ollama:/v -v $PWD:/b busybox tar czf /b/ollama.tgz -C /v . and similarly for openwebui. Restore by reversing the process.

Troubleshooting

Open WebUI cannot connect to Ollama: Ensure OLLAMA_BASE_URL=http://ollama:11434 and that both services run on the same default Compose network. Check logs with docker logs open-webui.

GPU not visible in container: Confirm nvidia-smi works on host. Verify toolkit with docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi. If Compose does not support GPUs, update to the latest Docker and use gpus: all or start Ollama once with docker run --gpus all to validate.

“could not load libcuda” or CUDA errors: Upgrade to a newer NVIDIA driver, restart Docker, and ensure nvidia-container-toolkit is correctly configured. Run sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker.

Permission denied on Docker: Add your user to the docker group (sudo usermod -aG docker $USER) and re-login.

Memory or OOM kills: Use smaller models, reduce concurrent sessions, or increase swap. You can also set OLLAMA_NUM_GPU=1 or adjust GPU split for multi-GPU hosts.

What’s Next

Explore model variants (LLM, vision, embedding) via Ollama’s registry, enable HTTPS with a reverse proxy, and connect automation via the compatible OpenAI API endpoints exposed by Open WebUI. With this setup, you get a fast, private, self-hosted AI chat experience backed by your own hardware.

LifeBytes Journal

Search This Blog