How to Self-Host Ollama and Open WebUI with NVIDIA GPU on Ubuntu 22.04/24.04

Overview

This step-by-step guide shows you how to self-host Ollama with Open WebUI on Ubuntu 22.04/24.04 and use your NVIDIA GPU for fast, private large language model (LLM) inference. You will install the correct NVIDIA drivers, Docker, and NVIDIA Container Toolkit, then deploy Ollama and Open WebUI with Docker Compose. The tutorial also covers updating, backing up models, and troubleshooting common errors such as GPU visibility and port conflicts.

Prerequisites

Before you begin, make sure you have: (1) Ubuntu 22.04 or 24.04 with sudo access, (2) an NVIDIA GPU with at least 6 GB VRAM for medium models (smaller models can work with less), (3) a stable internet connection, and (4) at least 20 GB free disk space for images and model files.

Step 1: Install NVIDIA Driver and Verify CUDA

Use Ubuntu’s built-in tool to install a matching proprietary driver. If Secure Boot is enabled, you may need to enroll a Machine Owner Key (MOK) during installation to load the NVIDIA kernel module.

sudo apt update
sudo ubuntu-drivers install
sudo reboot

After the reboot, confirm the driver is active:

nvidia-smi

You should see a table with your GPU and driver version. If you get an error, check Secure Boot (disable it or enroll the NVIDIA module), then repeat the install.

Step 2: Install Docker Engine and NVIDIA Container Toolkit

Install Docker from the official repository and add your user to the docker group so you can run containers without sudo.

sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Add NVIDIA Container Toolkit so containers can use your GPU:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify Docker can see your GPU:

docker run --rm --gpus all nvidia/cuda:12.5.0-base-ubuntu22.04 nvidia-smi

Step 3: Deploy Ollama and Open WebUI with Docker Compose

We will bind both services to localhost for safety. You can put a reverse proxy in front later for remote access.

mkdir -p ~/ollama-stack && cd ~/ollama-stack
nano compose.yaml

Paste the following compose file (save and exit):

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "127.0.0.1:11434:11434"
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_NUM_PARALLEL=1
    volumes:
      - ollama:/root/.ollama
    gpus: all

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      - OLLAMA_API_BASE=http://ollama:11434
    ports:
      - "127.0.0.1:3000:8080"
    volumes:
      - openwebui:/app/backend/data

volumes:
  ollama:
  openwebui:

Start the stack:

docker compose up -d

Pull a model into Ollama (example: a small, fast model):

docker exec -it ollama ollama pull llama3.2:3b

Quick API test:

curl http://127.0.0.1:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.2:3b","prompt":"Say hello in one sentence."}'

Open your browser at http://127.0.0.1:3000 and select Ollama as the provider. Choose the model you pulled and start chatting.

Step 4: Updates and Backups

To update Ollama and Open WebUI to the latest images while keeping your models and data, run:

cd ~/ollama-stack
docker compose pull
docker compose up -d

Back up volumes (models and WebUI data) with a simple tar archive:

docker stop open-webui ollama
docker run --rm -v ollama:/data -v "$PWD":/backup alpine \
  sh -c 'tar czf /backup/ollama-vol.tar.gz -C /data .'
docker run --rm -v openwebui:/data -v "$PWD":/backup alpine \
  sh -c 'tar czf /backup/openwebui-vol.tar.gz -C /data .'
docker start ollama open-webui

Troubleshooting

No CUDA-capable device detected: Ensure the NVIDIA driver is loaded (nvidia-smi works on the host). If Secure Boot is on, enroll the MOK or disable Secure Boot. Confirm the container sees the GPU with the CUDA test image. Re-run: sudo nvidia-ctk runtime configure --runtime=docker and restart Docker.

Compose error: unknown field "gpus": Your Docker Compose is outdated. Update Docker or use: docker run --gpus all ... Alternatively, in compose.yaml, remove gpus: all and start Ollama with: docker run -d --gpus all -p 127.0.0.1:11434:11434 -v ollama:/root/.ollama --name ollama ollama/ollama:latest

Port already in use: Change the host ports in compose.yaml (for example, 127.0.0.1:11435:11434 and 127.0.0.1:3001:8080) and re-run docker compose up -d.

Out-of-memory or slow responses: Choose a smaller or more quantized model (e.g., llama3.2:1b or a Q4 version if available). Limit parallel requests with OLLAMA_NUM_PARALLEL=1. Ensure you have adequate swap configured on the host for large models.

Security Tips

Keep services bound to 127.0.0.1 and place a reverse proxy with TLS in front (Caddy, Traefik, or Nginx) if you need remote access. For Open WebUI, enable authentication in its settings. Restrict firewall rules to only allow your reverse proxy and management IPs. Regularly update images and prune unused layers with docker system prune -af.

Clean Uninstall

To remove the stack and its volumes (this deletes downloaded models and chat data), run:

cd ~/ollama-stack
docker compose down -v

Conclusion

You have a fully private, GPU-accelerated local AI setup with Ollama and Open WebUI running on Ubuntu. This stack is easy to update, simple to back up, and flexible: you can try multiple models, script against the API, or place it behind a secure reverse proxy for team access. With one machine and an NVIDIA GPU, you now own your LLM workflow end to end.

Comments