Deploy Ollama and Open WebUI with NVIDIA GPU on Ubuntu 24.04 using Docker Compose

Overview

This tutorial shows how to self-host large language models locally by deploying Ollama and Open WebUI on Ubuntu 24.04 LTS with NVIDIA GPU acceleration using Docker Compose. Ollama handles model runtimes and downloads, while Open WebUI provides a friendly web interface, prompt management, and multi-user features. By the end, you will have a reproducible, GPU-enabled AI stack reachable from a browser on your LAN.

Prerequisites

Before you start, confirm: (1) Ubuntu 24.04 LTS installed and updated, (2) An NVIDIA GPU supported by recent drivers, (3) Administrative (sudo) access, and (4) Internet connectivity. If Secure Boot is enabled, you may need to enroll the NVIDIA kernel module signing key during driver installation.

Step 1: Prepare Ubuntu

sudo apt update && sudo apt -y upgrade
sudo apt -y install curl git ca-certificates gnupg

Step 2: Install the NVIDIA Driver

Use Ubuntu’s built-in tool to select the correct, current driver:

ubuntu-drivers list
sudo ubuntu-drivers autoinstall
sudo reboot

After reboot, verify the GPU is recognized:

nvidia-smi

Step 3: Install Docker Engine and Compose

Add the official Docker repository and install the engine plus the Compose plugin:

sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Step 4: Install NVIDIA Container Toolkit

This toolkit lets Docker containers access the GPU:

curl -fsSL https://nvidia.github.io/nvidia-container-toolkit/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/nvidia-container-toolkit/$distribution/nvidia-container-toolkit.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt -y install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Test GPU access in a container:

docker run --rm --gpus all nvidia/cuda:12.4.1-runtime-ubuntu22.04 nvidia-smi

Step 5: Create the Docker Compose project

Make a working directory and create your Compose file:

mkdir -p ~/ai-stack && cd ~/ai-stack

Create a file named docker-compose.yaml with the following content (indentation matters):

version: "3.9"
services:
ollama:
image: ollama/ollama:latest
restart: unless-stopped
ports:
- "11434:11434"
environment:
- OLLAMA_HOST=0.0.0.0
volumes:
- ollama:/root/.ollama
gpus: all

openwebui:
image: ghcr.io/open-webui/open-webui:latest
restart: unless-stopped
depends_on:
- ollama
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_NAME=MyLocalAI
volumes:
- openwebui:/app/backend/data

volumes:
ollama:
openwebui:

Step 6: Launch the stack

docker compose up -d

Wait a few seconds. Visit http://localhost:3000 (or http://SERVER_IP:3000) to open Open WebUI. The first user usually becomes the admin. You can manage models from the UI or the CLI.

Step 7: Pull a model and test

Pull a model into the Ollama volume (example: Meta’s Llama 3.1 8B):

docker compose exec ollama ollama pull llama3.1

Generate a quick response from the API:

curl -s http://localhost:11434/api/generate -d '{"model":"llama3.1","prompt":"Say hello from a local GPU!","stream":false}' | jq .response

In Open WebUI, choose the model from the dropdown and start chatting.

Step 8: Use the OpenAI-compatible API

Ollama exposes an OpenAI-style API under /v1. Example with Python’s OpenAI SDK:

pip install openai
python - <<'PY'
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
resp = client.chat.completions.create(
model="llama3.1",
messages=[{"role":"user","content":"Give me a one-line fun fact."}]
)
print(resp.choices[0].message.content)
PY

Maintenance and updates

To update images without losing your data or models stored in volumes, run:

cd ~/ai-stack
docker compose pull
docker compose up -d

To update or add models, use the UI or the CLI, for example: docker compose exec ollama ollama pull mistral.

Troubleshooting

nvidia-smi fails inside containers: Ensure the NVIDIA driver is installed and matches your GPU. Reboot after installation. Confirm Docker sees the GPU with docker run --rm --gpus all nvidia/cuda:12.4.1-runtime-ubuntu22.04 nvidia-smi.

Compose “gpus” not recognized: Check your Compose plugin version with docker compose version. Update Docker packages if outdated. As a fallback, configure the NVIDIA runtime as default and remove the gpus key: sudo nvidia-ctk runtime configure --runtime=docker --set-as-default && sudo systemctl restart docker.

Slow downloads or OOM: Models are large; use a fast, stable network and ensure enough VRAM and RAM. If a model does not fit your GPU, choose a smaller variant (e.g., llama3.1:8b or a quantized build like q4_K_M).

Security tips

Bind the services to your LAN or localhost by default and place them behind a reverse proxy with TLS if exposing over the internet. In Open WebUI, enable authentication and restrict new user registration if not needed. Consider a firewall rule to limit access to ports 11434 and 3000.

Remove the stack

To stop the containers while preserving data: docker compose down. To remove everything, including downloaded models and chat history: docker compose down -v.

You now have a modern, GPU-accelerated local AI stack that is easy to manage and update. The same approach works for additional services like vector databases or reverse proxies, making it a flexible foundation for on-prem AI experiments and production prototypes.

Comments