Deploy Ollama and Open WebUI on Docker with NVIDIA GPU Acceleration (Ubuntu Guide)

Overview

This step-by-step guide shows how to deploy Ollama (for running local LLMs) together with Open WebUI (a modern, browser-based interface) using Docker on Ubuntu. We will enable NVIDIA GPU acceleration for faster inference, set up persistent storage, and cover useful operational tips. By the end, you will have a local, privacy-friendly AI stack that you can run offline and manage easily.

Prerequisites

- Ubuntu 22.04 or 24.04 with sudo access
- An NVIDIA GPU with recent drivers (535+ recommended)
- At least 16 GB RAM for larger models; ensure enough disk space for model files
- Internet access to pull Docker images and models

1) Install Docker and Docker Compose

Install Docker from the official repository so you get the latest engine and the Compose plugin:

sudo apt update && sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER && newgrp docker

2) Install NVIDIA Drivers and Container Toolkit

Make sure the proprietary NVIDIA driver is installed and working (verify with nvidia-smi). Then install the NVIDIA Container Toolkit so Docker can access the GPU:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && \
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Test GPU visibility inside containers:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

3) Create the Docker Compose file

Create a project directory and a docker-compose.yml with two services: ollama and open-webui. This configuration exposes Open WebUI on port 8080 and mounts persistent volumes for both services.

mkdir -p ~/ollama-stack && cd ~/ollama-stack
nano docker-compose.yml

services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"]
environment:
- OLLAMA_MAX_LOADED_MODELS=2
runtime: nvidia

open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
depends_on:
- ollama
restart: unless-stopped
ports:
- "8080:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_NAME=Local LLM Console
volumes:
- openwebui-data:/app/backend/data

volumes:
ollama-data:
openwebui-data:

4) Launch the stack

Start both containers with Docker Compose:

docker compose up -d

Open WebUI will be available at http://<your-server-ip>:8080. The first visit prompts you to create an admin user.

5) Pull and run a model

You can pull models using the Ollama CLI within the container. For example, to fetch and run a popular 7B model:

docker exec -it ollama ollama pull llama3
docker exec -it ollama ollama run llama3

In Open WebUI, select the same model name (e.g., llama3) from the model dropdown. You can add system prompts, adjust temperature, or manage multiple models.

6) Persist, update, and back up

- Persistent data: All models live in the ollama-data volume, and Open WebUI settings live in openwebui-data.
- Update images: docker compose pull && docker compose up -d
- Back up volumes: docker run --rm -v ollama-data:/data -v $(pwd):/backup busybox tar czf /backup/ollama-data.tgz /data

7) Optional: restrict access and add HTTPS

For a single-user setup, it is safer to bind Open WebUI to localhost and use an SSH tunnel. Change the Open WebUI service port mapping from "8080:8080" to "127.0.0.1:8080:8080" and restart. Then access it with ssh -L 8080:localhost:8080 user@server.

If you need public access with HTTPS, place a reverse proxy (e.g., Caddy or Nginx) in front. With Caddy, a minimal site block looks like:

ai.example.com {
reverse_proxy 127.0.0.1:8080
}

Point DNS to your server and Caddy will fetch certificates automatically. Add HTTP auth or an allowlist if the instance is exposed to the internet.

8) Troubleshooting

Docker container cannot see the GPU: Confirm the host runs nvidia-smi successfully. Reinstall the NVIDIA Container Toolkit, run sudo nvidia-ctk runtime configure --runtime=docker, then sudo systemctl restart docker. Also ensure your compose file sets runtime: nvidia or uses --gpus all if running directly with docker run.

Out of memory or slow performance: Start with smaller models (e.g., 3B–7B). For better throughput, set num_ctx and num_gpu options when creating models with Ollama, and close other GPU-heavy apps.

Port conflicts: Change host port mappings in the compose file, e.g., "8081:8080" for Open WebUI or "11435:11434" for Ollama, and redeploy.

9) Notes for AMD/Apple users

This tutorial targets NVIDIA on Linux. For AMD GPUs on Linux, investigate ROCm builds of Ollama and ensure your GPU is supported by ROCm. On Apple Silicon, you can run both services natively or with Docker Desktop; GPU acceleration leverages Apple’s Metal backend automatically in the native build.

Wrap-up

You’ve deployed a modern local AI stack with Ollama and Open WebUI using Docker and enabled NVIDIA GPU acceleration on Ubuntu. With persistent storage, easy upgrades, and optional HTTPS, this setup is production-friendly for personal research, prototyping, and helpdesk automations. Experiment with different models, tweak performance flags, and keep your system updated for the best results.

LifeBytes Journal

Search This Blog