Deploy Ollama + Open WebUI on Ubuntu with GPU Acceleration using Docker Compose

Running large language models locally is now practical and fast, especially with GPU acceleration. In this tutorial, you will deploy Ollama and Open WebUI on Ubuntu 22.04/24.04 using Docker Compose. This stack gives you a private, browser-based interface for modern LLMs (Llama, Mistral, Phi, etc.) with one-click model management and secure, self-hosted inference.

Why this stack?

Ollama simplifies downloading, quantizing, and serving LLMs on your machine. Open WebUI adds a clean chat interface, prompt templates, file uploads, and multi-user access. Together, they provide a robust local AI setup that is easy to update and portable across servers.

Prerequisites

- Ubuntu Server 22.04 or 24.04 (fresh system recommended)

- An NVIDIA GPU with recent drivers (T4, RTX 20/30/40, A-series, etc.)

- sudo access and an internet connection

- Optional: a domain name for HTTPS (e.g., ai.example.com)

Step 1 — Install Docker Engine and Compose

Install Docker from the official repository to ensure up-to-date features like GPU support in Docker Compose.

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version

Step 2 — Enable GPU with NVIDIA Container Toolkit

Install the NVIDIA Container Toolkit to pass the GPU into containers. Verify that the host can see the GPU with nvidia-smi before proceeding.

# If you don't have drivers:
# sudo ubuntu-drivers install && sudo reboot

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit

# Configure Docker to use the NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Sanity check
nvidia-smi

Step 3 — Create the Docker Compose stack

We will run two services: Ollama (backend API on port 11434) and Open WebUI (frontend on port 3000) connected via a Docker network. The compose file also enables GPU support for Ollama.

mkdir -p ~/ollama-openwebui && cd ~/ollama-openwebui
cat > docker-compose.yml <<'YAML'
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    gpus: all
    environment:
      - OLLAMA_KEEP_ALIVE=1h
      - OLLAMA_HOST=0.0.0.0

  openwebui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: openwebui
    restart: unless-stopped
    depends_on:
      - ollama
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_AUTH=True
    volumes:
      - openwebui:/app/backend/data

volumes:
  ollama:
  openwebui:
YAML

Step 4 — Launch and access Open WebUI

Start the stack and watch logs for any errors. The first launch will pull images.

docker compose up -d
docker compose logs -f --tail=100

Open your browser to http://SERVER_IP:3000. Create the first admin user when prompted. Open WebUI will automatically detect Ollama via the internal URL and list available models.

Step 5 — Pull a model and test

Use either the WebUI model manager or the CLI to fetch models. The example below pulls a popular 7B model.

# Pull from the host (proxies into the container)
docker exec -it ollama ollama pull llama3.1:8b

# Quick API smoke test
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Say hello from a local LLM.",
  "stream": false
}'

In Open WebUI, select the model from the dropdown and start chatting. If you have enough VRAM, consider quantized larger models (e.g., 13B/70B Q4/Q5) for better reasoning.

Optional — Secure with a Caddy reverse proxy and HTTPS

If you have a domain, use Caddy to obtain and renew TLS automatically. This example exposes Open WebUI securely on port 443 and keeps Ollama private.

sudo apt install -y caddy
sudo tee /etc/caddy/Caddyfile >/dev/null <<'CADDY'
ai.example.com {
  encode zstd gzip
  reverse_proxy 127.0.0.1:3000
}
CADDY
sudo systemctl reload caddy

Point your DNS A/AAAA record to the server. Then visit https://ai.example.com. For teams, enable WebUI auth (already set) and create users from the admin settings.

Back up and update

To back up your models and chats, save the named volumes. You can also snapshot the folders from the host.

# Export volumes to tarballs
docker run --rm -v ollama:/v -v $(pwd):/b busybox tar czf /b/ollama-vol.tgz -C /v .
docker run --rm -v openwebui:/v -v $(pwd):/b busybox tar czf /b/openwebui-vol.tgz -C /v .

# Update images safely
docker compose pull
docker compose up -d

Troubleshooting

- No GPU detected: Ensure nvidia-smi works on the host. Re-run nvidia-ctk runtime configure, restart Docker, and verify the container sees the GPU:

docker exec -it ollama bash -lc 'nvidia-smi || ls -l /dev/nvidia*'

- Slow generation: Use quantized models (Q4_K_M/Q5_K_M), avoid oversize context windows, and confirm GPU is actually used (GPU utilization should rise in nvidia-smi during inference).

- Port conflicts: Change mapped ports in docker-compose.yml, e.g., "3001:8080" for Open WebUI or put a reverse proxy in front.

- Permission errors on volumes: Ensure your user is in the docker group and that the Docker daemon can write to the volume paths.

Security tips

- Keep Ollama bound to the internal network and only expose Open WebUI through TLS.

- Enable authentication (already set via WEBUI_AUTH=True). Use strong passwords and consider putting Open WebUI behind a VPN or SSO.

- Restrict firewall ports using UFW: allow 22/tcp and 443/tcp, then deny others.

Conclusion

You now have a GPU-accelerated, private AI stack with Ollama and Open WebUI on Ubuntu, orchestrated by Docker Compose. It is easy to upgrade, portable across servers, and suitable for personal research or team deployments. With this foundation, you can iterate quickly, evaluate new models as they drop, and keep your data fully on-prem.

Comments