Overview
This guide shows how to deploy Ollama (for running local LLMs) together with Open WebUI (a clean ChatGPT-like interface) on Ubuntu 22.04/24.04 using Docker Compose and an NVIDIA GPU. You will install Docker, enable GPU acceleration with the NVIDIA Container Toolkit, run both services, pull a model, and fix common errors. If you do not have a GPU, a CPU-only note is included.
Prerequisites
- Ubuntu 22.04 or 24.04 with sudo access.
- An NVIDIA GPU (Turing or newer recommended) with recent drivers (535+ works well) and at least 8 GB VRAM for medium models.
- Internet connectivity and ports 11434 (Ollama) and 3000 (Open WebUI) available.
Step 1: Verify and Install NVIDIA Drivers
Ensure a recent NVIDIA driver is installed and visible to the system. Check with: nvidia-smi. If it shows driver and GPU details, continue. If not, install a recommended driver and reboot:
sudo ubuntu-drivers installsudo reboot
Step 2: Install Docker Engine and Compose Plugin
Set up the official Docker repository and install Docker plus the Compose plugin:
sudo apt-get updatesudo apt-get install -y ca-certificates curl gnupgsudo install -m 0755 -d /etc/apt/keyringscurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpgecho "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt-get updatesudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginsudo usermod -aG docker $USERnewgrp docker
Step 3: Install NVIDIA Container Toolkit for Docker
This toolkit exposes your GPU to containers via Docker. Install and restart Docker:
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpgcurl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#g' | \sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt-get update && sudo apt-get install -y nvidia-container-toolkitsudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
Test that containers can see the GPU:
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
Step 4: Create a Docker Compose File
Create a project directory, then a compose file:
mkdir -p ~/ollama-openwebui && cd ~/ollama-openwebuinano docker-compose.yml
Paste the following content. This maps GPU to Ollama, persists data, and links the UI to the API.
services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped ports: - "11434:11434" volumes: - ollama:/root/.ollama environment: - OLLAMA_KEEP_ALIVE=6h deploy: resources: reservations: devices: - capabilities: ["gpu"] # If your Docker Compose supports it, prefer: gpus: all # gpus: all open-webui: image: ghcr.io/open-webui/open-webui:latest container_name: open-webui restart: unless-stopped ports: - "3000:8080" environment: - OLLAMA_BASE_URL=http://ollama:11434 - ENABLE_SIGNUP=false volumes: - openwebui:/app/backend/data depends_on: - ollamavolumes: ollama: openwebui:
Note: If your Compose version errors on deploy.resources..., upgrade Docker Compose and use gpus: all under the ollama service instead.
Step 5: Start the Stack and Pull a Model
Launch both containers:
docker compose up -d
Pull a model into Ollama (example: Llama 3.1 8B). You can pull from the host or exec into the container:
docker exec -it ollama ollama pull llama3.1:8b
Open your browser at http://<server-ip>:3000. Create an admin account on first run (if sign-up is disabled, enable it temporarily or set credentials via UI). Choose the model you pulled and start chatting.
Optional: CPU-Only Mode
If you do not have a supported GPU, remove the GPU settings and add OLLAMA_NO_GPU=1 to the ollama environment. Performance will be slower, so consider smaller models like llama3.1:8b-instruct or mistral.
Security, Updates, and Backups
- Network access: Do not expose port 11434 to the internet. Only expose 3000 (the UI) behind a reverse proxy like Nginx, Traefik, or Caddy with HTTPS.
- Authentication: Open WebUI supports local accounts. Disable public sign-ups by keeping ENABLE_SIGNUP=false and add users manually via the admin panel.
- Updates: Pull new images and recreate containers: docker compose pull && docker compose up -d.
- Backups: Save volumes with docker run --rm -v ollama:/v -v $PWD:/b busybox tar czf /b/ollama.tgz -C /v . and similarly for openwebui. Restore by reversing the process.
Troubleshooting
Open WebUI cannot connect to Ollama: Ensure OLLAMA_BASE_URL=http://ollama:11434 and that both services run on the same default Compose network. Check logs with docker logs open-webui.
GPU not visible in container: Confirm nvidia-smi works on host. Verify toolkit with docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi. If Compose does not support GPUs, update to the latest Docker and use gpus: all or start Ollama once with docker run --gpus all to validate.
“could not load libcuda” or CUDA errors: Upgrade to a newer NVIDIA driver, restart Docker, and ensure nvidia-container-toolkit is correctly configured. Run sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker.
Permission denied on Docker: Add your user to the docker group (sudo usermod -aG docker $USER) and re-login.
Memory or OOM kills: Use smaller models, reduce concurrent sessions, or increase swap. You can also set OLLAMA_NUM_GPU=1 or adjust GPU split for multi-GPU hosts.
What’s Next
Explore model variants (LLM, vision, embedding) via Ollama’s registry, enable HTTPS with a reverse proxy, and connect automation via the compatible OpenAI API endpoints exposed by Open WebUI. With this setup, you get a fast, private, self-hosted AI chat experience backed by your own hardware.
Comments
Post a Comment