Running large language models locally is now practical and fast, especially with GPU acceleration. In this tutorial, you will deploy Ollama and Open WebUI on Ubuntu 22.04/24.04 using Docker Compose. This stack gives you a private, browser-based interface for modern LLMs (Llama, Mistral, Phi, etc.) with one-click model management and secure, self-hosted inference.
Why this stack?
Ollama simplifies downloading, quantizing, and serving LLMs on your machine. Open WebUI adds a clean chat interface, prompt templates, file uploads, and multi-user access. Together, they provide a robust local AI setup that is easy to update and portable across servers.
Prerequisites
- Ubuntu Server 22.04 or 24.04 (fresh system recommended)
- An NVIDIA GPU with recent drivers (T4, RTX 20/30/40, A-series, etc.)
- sudo access and an internet connection
- Optional: a domain name for HTTPS (e.g., ai.example.com)
Step 1 — Install Docker Engine and Compose
Install Docker from the official repository to ensure up-to-date features like GPU support in Docker Compose.
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version
Step 2 — Enable GPU with NVIDIA Container Toolkit
Install the NVIDIA Container Toolkit to pass the GPU into containers. Verify that the host can see the GPU with nvidia-smi before proceeding.
# If you don't have drivers:
# sudo ubuntu-drivers install && sudo reboot
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
# Configure Docker to use the NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Sanity check
nvidia-smi
Step 3 — Create the Docker Compose stack
We will run two services: Ollama (backend API on port 11434) and Open WebUI (frontend on port 3000) connected via a Docker network. The compose file also enables GPU support for Ollama.
mkdir -p ~/ollama-openwebui && cd ~/ollama-openwebui
cat > docker-compose.yml <<'YAML'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
gpus: all
environment:
- OLLAMA_KEEP_ALIVE=1h
- OLLAMA_HOST=0.0.0.0
openwebui:
image: ghcr.io/open-webui/open-webui:latest
container_name: openwebui
restart: unless-stopped
depends_on:
- ollama
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=True
volumes:
- openwebui:/app/backend/data
volumes:
ollama:
openwebui:
YAML
Step 4 — Launch and access Open WebUI
Start the stack and watch logs for any errors. The first launch will pull images.
docker compose up -d
docker compose logs -f --tail=100
Open your browser to http://SERVER_IP:3000. Create the first admin user when prompted. Open WebUI will automatically detect Ollama via the internal URL and list available models.
Step 5 — Pull a model and test
Use either the WebUI model manager or the CLI to fetch models. The example below pulls a popular 7B model.
# Pull from the host (proxies into the container)
docker exec -it ollama ollama pull llama3.1:8b
# Quick API smoke test
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Say hello from a local LLM.",
"stream": false
}'
In Open WebUI, select the model from the dropdown and start chatting. If you have enough VRAM, consider quantized larger models (e.g., 13B/70B Q4/Q5) for better reasoning.
Optional — Secure with a Caddy reverse proxy and HTTPS
If you have a domain, use Caddy to obtain and renew TLS automatically. This example exposes Open WebUI securely on port 443 and keeps Ollama private.
sudo apt install -y caddy
sudo tee /etc/caddy/Caddyfile >/dev/null <<'CADDY'
ai.example.com {
encode zstd gzip
reverse_proxy 127.0.0.1:3000
}
CADDY
sudo systemctl reload caddy
Point your DNS A/AAAA record to the server. Then visit https://ai.example.com. For teams, enable WebUI auth (already set) and create users from the admin settings.
Back up and update
To back up your models and chats, save the named volumes. You can also snapshot the folders from the host.
# Export volumes to tarballs
docker run --rm -v ollama:/v -v $(pwd):/b busybox tar czf /b/ollama-vol.tgz -C /v .
docker run --rm -v openwebui:/v -v $(pwd):/b busybox tar czf /b/openwebui-vol.tgz -C /v .
# Update images safely
docker compose pull
docker compose up -d
Troubleshooting
- No GPU detected: Ensure nvidia-smi works on the host. Re-run nvidia-ctk runtime configure, restart Docker, and verify the container sees the GPU:
docker exec -it ollama bash -lc 'nvidia-smi || ls -l /dev/nvidia*'
- Slow generation: Use quantized models (Q4_K_M/Q5_K_M), avoid oversize context windows, and confirm GPU is actually used (GPU utilization should rise in nvidia-smi during inference).
- Port conflicts: Change mapped ports in docker-compose.yml, e.g., "3001:8080" for Open WebUI or put a reverse proxy in front.
- Permission errors on volumes: Ensure your user is in the docker group and that the Docker daemon can write to the volume paths.
Security tips
- Keep Ollama bound to the internal network and only expose Open WebUI through TLS.
- Enable authentication (already set via WEBUI_AUTH=True). Use strong passwords and consider putting Open WebUI behind a VPN or SSO.
- Restrict firewall ports using UFW: allow 22/tcp and 443/tcp, then deny others.
Conclusion
You now have a GPU-accelerated, private AI stack with Ollama and Open WebUI on Ubuntu, orchestrated by Docker Compose. It is easy to upgrade, portable across servers, and suitable for personal research or team deployments. With this foundation, you can iterate quickly, evaluate new models as they drop, and keep your data fully on-prem.
Comments
Post a Comment