How to Run Ollama + Open WebUI with GPU on Ubuntu Using Docker Compose

Local large language models (LLMs) have matured rapidly, and running them with GPU acceleration on your own server is now simple. In this step-by-step tutorial, you will deploy Ollama (model runtime) and Open WebUI (a friendly chat interface) on Ubuntu 22.04/24.04 using Docker Compose and the NVIDIA Container Toolkit.

Prerequisites

- An Ubuntu 22.04 or 24.04 machine with an NVIDIA GPU (Turing or newer recommended) and internet access.

- Administrative (sudo) access.

- Basic familiarity with the terminal and Docker.

1) Install NVIDIA Driver

First, make sure your system is up to date, then install the recommended NVIDIA driver. If you are already on the correct proprietary driver, you can skip this step.

sudo apt update && sudo apt upgrade -y
sudo ubuntu-drivers autoinstall
sudo reboot

After reboot, confirm the GPU is available:

nvidia-smi

You should see a table with your GPU model and driver version.

2) Install Docker Engine

Use the convenience script from Docker to install the latest Docker Engine quickly. Alternatively, follow the official repository instructions if you prefer a locked version.

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

Verify Docker is working:

docker run --rm hello-world

3) Install NVIDIA Container Toolkit

The NVIDIA Container Toolkit lets containers access your GPU. Install it and configure Docker to use the NVIDIA runtime.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Test GPU access inside a container:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

If you see your GPU, you are ready for Ollama.

4) Create a Docker Compose file

Make a project directory and create a docker-compose.yml that runs both services with persistent volumes and GPU support.

mkdir -p ~/ollama-webui && cd ~/ollama-webui
nano docker-compose.yml

version: "3.9"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    environment:
      - OLLAMA_KEEP_ALIVE=24h
    volumes:
      - ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      - OLLAMA_API_BASE_URL=http://ollama:11434
      - WEBUI_AUTH=true
    ports:
      - "3000:8080"
    volumes:
      - openwebui:/app/backend/data

volumes:
  ollama:
  openwebui:

Note: The deploy.resources.devices block requests GPU access using the NVIDIA driver. Docker Compose v2+ supports this on modern hosts. If your Compose version ignores it, see the troubleshooting section for alternatives.

5) Start the stack

Bring up both containers in the background:

docker compose up -d
docker compose ps

Open your browser to http://localhost:3000 (or the server’s IP on port 3000). The web interface will prompt for account setup if authentication is enabled.

6) Pull and run a model

You can pull models from the UI (Models section), or via CLI inside the Ollama container. Example with Llama 3.1 (8B):

docker exec -it ollama ollama pull llama3.1:8b

Once the model is downloaded, select it in Open WebUI and start chatting. GPU memory matters: 8B models typically need ~6–8 GB VRAM; 70B needs much more. If you are low on VRAM, try smaller or quantized variants (e.g., Q4_K_M builds).

7) Persistence, updates, and backups

- Your models and settings live in Docker volumes named “ollama” and “openwebui.” To back them up, stop the stack and archive /var/lib/docker/volumes/ollama and /var/lib/docker/volumes/openwebui.

- To update images: docker compose pull && docker compose up -d.

- To move the setup to another host, copy the Compose file and restore the volumes.

8) Secure access (optional)

For internet exposure, place Open WebUI behind a reverse proxy with TLS (e.g., Caddy or Nginx) and keep WEBUI_AUTH=true. Consider network ACLs or a VPN like Tailscale/WireGuard for private, zero-trust access.

Troubleshooting

- GPU not detected in containers: ensure nvidia-smi works on the host; re-run sudo nvidia-ctk runtime configure --runtime=docker; restart Docker; try docker compose down && up -d.

- If your Compose version ignores deploy.devices, try adding a profile or running with CLI flags. For example, launch Ollama separately:

docker run -d --name ollama --gpus all -p 11434:11434 \
  -v ollama:/root/.ollama --restart unless-stopped ollama/ollama:latest

- Performance tips: set model context length lower in WebUI, avoid running multiple models at once, and monitor VRAM usage with nvidia-smi.

With this setup, you get a modern, GPU-accelerated local LLM stack that is fast, private, and easy to maintain using Docker Compose. Enjoy building AI workflows without sending your data to the cloud.

LifeBytes Journal

Search This Blog