Overview
This step-by-step guide shows you how to run a local AI stack on Ubuntu 22.04/24.04 using Docker Compose, Ollama, and Open WebUI with NVIDIA GPU acceleration. Ollama provides a lightweight local API for popular large language models (LLMs) like Llama 3, Mistral, and Qwen, while Open WebUI delivers a clean, user-friendly chat interface. By the end, you will have a secure, updatable setup that serves a local LLM with GPU support for fast responses and offline privacy.
Prerequisites
System: Ubuntu 22.04 or 24.04 with a recent NVIDIA GPU driver installed. Aim for at least 16 GB RAM and sufficient disk space (20–40 GB or more, depending on models). This tutorial uses Docker Engine, Docker Compose plugin, and the NVIDIA Container Toolkit.
Network/Ports: Ollama exposes port 11434 (local only in this guide). Open WebUI will use port 3000. Adjust firewall rules if the server is internet-facing.
1) Install Docker Engine and Compose
Run the following commands to install Docker and the Compose plugin:
sudo apt updatesudo apt install -y ca-certificates curl gnupgsudo install -m 0755 -d /etc/apt/keyringscurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpgecho "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt updatesudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginsudo usermod -aG docker $USERnewgrp docker
2) Enable NVIDIA GPU for Containers
Install the NVIDIA Container Toolkit so Docker can pass your GPU to containers. Make sure the host driver is already installed and nvidia-smi works on the host.
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpgcurl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt updatesudo apt install -y nvidia-container-toolkitsudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
Quick GPU test inside a container (optional):
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
3) Create Docker Compose for Ollama + Open WebUI
Create a working folder and a docker-compose.yml file:
mkdir -p ~/local-llm && cd ~/local-llmnano docker-compose.yml
Paste the following content, then save:
services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped volumes: - ollama:/root/.ollama environment: - OLLAMA_KEEP_ALIVE=2h gpus: all ports: - "127.0.0.1:11434:11434" open-webui: image: ghcr.io/open-webui/open-webui:main container_name: open-webui restart: unless-stopped depends_on: - ollama environment: - OLLAMA_API_BASE=http://ollama:11434 - WEBUI_SECRET_KEY=change_me_long_random - ENABLE_SIGNUP=false - [email protected] volumes: - open-webui:/app/backend/data ports: - "3000:8080"volumes: ollama: open-webui:
Binding Ollama to 127.0.0.1 keeps the model API private. Open WebUI is exposed on port 3000. For a remote VPS, secure it with a firewall or reverse proxy before exposing it.
4) Start the Stack and Pull a Model
Bring everything up:
docker compose up -d
Pull a model into Ollama (example: Llama 3.1 8B):
docker exec -it ollama ollama pull llama3.1:8b
Test the API locally:
curl http://127.0.0.1:11434/api/tagscurl http://127.0.0.1:11434/api/generate -d '{"model":"llama3.1:8b","prompt":"Say hello in one sentence."}'
Open your browser to http://SERVER_IP:3000 (or http://localhost:3000) to access Open WebUI and start chatting with the model you pulled.
5) Secure Access
If this runs on a server, restrict port 3000 to trusted IPs or place it behind HTTPS using a reverse proxy (Caddy, Nginx, or Traefik). For quick private access through SSH tunneling, use:
ssh -L 3000:localhost:3000 user@SERVER_IP
Set ENABLE_SIGNUP=false to prevent public registrations and choose a strong WEBUI_SECRET_KEY. You can also bind Open WebUI to localhost only by changing the port mapping to 127.0.0.1:3000:8080 and serving it via your reverse proxy.
6) Update, Backup, and Maintenance
Update images: keep the stack current with:
docker compose pull && docker compose up -d
Backup models and data: the named volumes hold your models and UI data. You can archive them like this:
mkdir -p ~/local-llm/backups && cd ~/local-llmdocker run --rm -v ollama:/data -v "$PWD/backups":/backup alpine sh -c 'tar czf /backup/ollama-vol.tar.gz -C /data .'docker run --rm -v open-webui:/data -v "$PWD/backups":/backup alpine sh -c 'tar czf /backup/openwebui-vol.tar.gz -C /data .'
Stop/Start: docker compose down stops containers but keeps volumes. Use docker compose down -v to remove volumes as well (this deletes downloaded models and chat history).
7) Troubleshooting
GPU not detected: run nvidia-smi on the host; if it fails, reinstall the NVIDIA driver. Verify the container sees your GPU with docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi. Ensure you ran sudo nvidia-ctk runtime configure --runtime=docker and restarted Docker.
Slow or out-of-memory: choose a smaller or quantized model (for example, llama3.1:8b or mistral:7b). Large models need more VRAM. You can also run CPU-only by removing GPU options, but performance will drop.
Port conflicts: change the host ports in docker-compose.yml (e.g., "127.0.0.1:11435:11434" and "3001:8080"), then docker compose up -d.
Cannot access WebUI: confirm the container is healthy with docker ps and check logs via docker logs open-webui. If remote, verify firewall rules allow your IP to reach port 3000 or use SSH tunneling.
What You Achieved
You now have a modern, GPU-accelerated local LLM platform on Ubuntu using Docker Compose. Ollama handles model management and API requests, while Open WebUI provides a polished chat experience. This stack is easy to update, simple to back up, and private by default—ideal for development, helpdesk knowledge assistants, and secure, offline AI workflows.
Comments
Post a Comment