Self-Host Ollama + Open WebUI with NVIDIA GPU on Ubuntu (Docker Compose Guide)

Overview

This guide shows you how to self-host Ollama and Open WebUI on Ubuntu using Docker Compose with NVIDIA GPU acceleration. Ollama makes it easy to run popular local LLMs (like Llama 3, Mistral, Phi, and Qwen), while Open WebUI provides a clean, multi-user chat interface, prompt management, and model switching. By the end, you will have a persistent, GPU-enabled AI stack reachable in your browser, suitable for personal use or a small team.

Prerequisites

- Ubuntu 22.04 or 24.04 (server or desktop), 16 GB RAM recommended.

- An NVIDIA GPU with recent drivers (8 GB VRAM or more recommended for 7B/8B models).

- Docker Engine and the Docker Compose plugin.

- A user with sudo privileges and outbound internet access.

Step 1 — Install Docker and Docker Compose

If Docker is not installed, run:

sudo apt update && sudo apt install -y ca-certificates curl gnupg

sudo install -m 0755 -d /etc/apt/keyrings

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Verify Docker works: docker version and docker compose version.

Step 2 — Enable NVIDIA GPU in Containers

Install the NVIDIA Container Toolkit so Docker can access your GPU. First, ensure the NVIDIA driver is installed and nvidia-smi works on the host. Then run:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \

sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update && sudo apt install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker

sudo systemctl restart docker

Test inside a container: docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi. You should see your GPU listed.

Step 3 — Create the Docker Compose file

Make a new folder for the stack and create compose.yml in it:

mkdir -p ~/ai-stack && cd ~/ai-stack

Use this minimal Compose configuration (Ollama + Open WebUI, GPU-enabled, with persistent volumes):

services: ollama: image: ollama/ollama:latest container_name: ollama ports: - "11434:11434" volumes: - ollama:/root/.ollama environment: - OLLAMA_KEEP_ALIVE=24h gpus: all restart: unless-stopped open-webui: image: ghcr.io/open-webui/open-webui:latest container_name: open-webui depends_on: - ollama ports: - "3000:8080" environment: - OLLAMA_BASE_URL=http://ollama:11434 - WEBUI_AUTH=True volumes: - openwebui:/app/backend/data restart: unless-stopped volumes: ollama: openwebui:

This setup exposes Ollama on port 11434 (API) and Open WebUI on 3000 (web). Data persists in Docker volumes, so updates do not erase models or chats.

Step 4 — Launch the stack and pull a model

Start both services in the background:

docker compose up -d

Check logs to confirm GPU access and healthy startup:

docker logs -f ollama and docker logs -f open-webui

Pull your first model (example: Llama 3.1 8B) and verify inference:

docker exec -it ollama ollama pull llama3.1:8b

docker exec -it ollama ollama run llama3.1:8b

Open a browser to http://<your_server_ip>:3000, create your admin account, choose the pulled model, and start chatting.

Step 5 — Secure access and basic hardening

Open WebUI has built-in auth. The Compose file sets WEBUI_AUTH=True, which prompts for signup on first visit. After creating the admin user, disable new registrations by adding ENABLE_SIGNUP=False under the open-webui environment and redeploy with docker compose up -d.

If you will expose the UI on the internet, place it behind a reverse proxy with HTTPS. For example, with Caddy on the same host, you can proxy to port 3000 and get automatic TLS:

my-ai.example.com { reverse_proxy 127.0.0.1:3000 }

Alternatively, use Nginx and a free TLS certificate from Let's Encrypt. Restrict access with IP allowlists or SSO if available.

Step 6 — Useful environment options

- OLLAMA_KEEP_ALIVE: Keeps models warm for faster first-token latency (e.g., 24h).

- WEBUI_AUTH and ENABLE_SIGNUP: Enable auth and control who can create accounts.

- OLLAMA_NUM_PARALLEL: Limit concurrent requests to protect VRAM.

- OPENAI_API_BASE_URL (Open WebUI): Point tools or plugins to Ollama if needed for compatibility layers.

Step 7 — Backup and update strategy

Your chats and models live in Docker volumes (ollama and openwebui). To back them up quickly, stop the stack and archive the volumes:

docker compose down

docker run --rm -v ollama:/data -v $(pwd):/backup busybox tar czf /backup/ollama-vol.tar.gz -C /data .

docker run --rm -v openwebui:/data -v $(pwd):/backup busybox tar czf /backup/openwebui-vol.tar.gz -C /data .

To update, pull the latest images and redeploy:

docker compose pull && docker compose up -d

Troubleshooting

- No GPU visible in containers: confirm nvidia-smi works on the host, that the NVIDIA Container Toolkit is installed, and that gpus: all is present under the Ollama service.

- Port in use: change 11434 or 3000 in compose.yml if conflicts arise.

- Out of memory (VRAM): try a smaller model variant (e.g., 7B/8B quantized like Q4_K_M), or reduce parallel requests. Example pull: ollama pull llama3.1:8b-instruct-q4_K_M.

- Slow first response: increase OLLAMA_KEEP_ALIVE or keep frequently used models loaded.

What you built

You now have a modern, GPU-accelerated local AI stack with Ollama and Open WebUI running on Docker Compose. It is easy to manage, fast to update, and simple to secure behind HTTPS. Add more models, enable extensions, or integrate with automation tools to turn this into a private, production-ready assistant for your workstation or team.

LifeBytes Journal

Search This Blog