Overview
This tutorial shows how to deploy a private, local AI stack with Ollama (model runtime) and Open WebUI (chat interface) using Docker on Ubuntu 22.04/24.04. You will learn how to run it on CPU, enable NVIDIA or AMD/ROCm GPU acceleration, secure the web interface, and keep everything up to date. The result is a fast, reliable, and low-maintenance setup suitable for labs, developers, and small teams.
Prerequisites
You need an Ubuntu 22.04 or 24.04 system with sudo access, 16 GB+ RAM (more is better), 20 GB+ free disk space, and a stable internet connection. For GPU acceleration, use a recent NVIDIA GPU with official drivers or a compatible AMD GPU with ROCm-capable kernel and hardware. Ensure ports 11434 (Ollama) and 3000 (Open WebUI) are free. If you plan to expose the service on the internet, prepare a domain name and DNS A/AAAA record pointing to the server.
Step 1: Install Docker Engine and Compose
sudo apt update && sudo apt install -y ca-certificates curl gnupgsudo install -m 0755 -d /etc/apt/keyringscurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpgecho "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginsudo usermod -aG docker $USERnewgrp docker
Step 2: GPU Preparation (optional but recommended)
NVIDIA: Install the proprietary driver and the NVIDIA Container Toolkit so Docker can access your GPU.
sudo apt install -y ubuntu-drivers-commonsudo ubuntu-drivers installsudo reboot
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpgcurl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt update && sudo apt install -y nvidia-container-toolkitsudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
AMD (ROCm): Ensure your GPU is ROCm-capable and the kfd and dri devices are present. Give your user access to the required groups.
sudo usermod -aG render,video $USERsudo reboot
Step 3: Create a Docker Compose file
Create a working directory like ~/ai-stack, then create docker-compose.yml. The following example starts Ollama and Open WebUI with volumes for persistence. It includes variants for CPU, NVIDIA, and AMD. Only keep one GPU option at a time.
docker-compose.yml (CPU-only by default):
services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped ports: - "11434:11434" volumes: - ollama:/root/.ollama environment: - OLLAMA_KEEP_ALIVE=24h - OLLAMA_NUM_THREADS=8 # For NVIDIA GPU (uncomment the next 4 lines and comment the AMD lines below): # runtime: nvidia # environment: # - NVIDIA_VISIBLE_DEVICES=all # - NVIDIA_DRIVER_CAPABILITIES=compute,utility # For AMD ROCm GPU (use the ROCm image and device mappings): # image: ollama/ollama:rocm # devices: # - /dev/kfd # - /dev/dri # group_add: # - "video" # - "render" open-webui: image: ghcr.io/open-webui/open-webui:latest container_name: open-webui depends_on: - ollama restart: unless-stopped environment: - OLLAMA_API_BASE=http://ollama:11434 ports: - "3000:8080" volumes: - openwebui:/app/backend/datavolumes: ollama: openwebui:
Step 4: Start the stack
docker compose up -d
Check containers and logs to confirm both services are healthy.
docker psdocker logs -f ollamadocker logs -f open-webui
Step 5: Pull a model and test
Use Ollama to download a model. Popular choices are llama3.1:8b, llama3.1:70b (needs more VRAM), mistral, or qwen2. Start with an 8B or 7B model to validate your setup.
docker exec -it ollama ollama pull llama3.1:8bcurl http://localhost:11434/api/tags
Open a browser to http://<server-ip>:3000. The first user that signs up in Open WebUI becomes the admin. In Settings, point the Ollama endpoint to http://ollama:11434 (it is already set via OLLAMA_API_BASE). Create a new chat and pick your model from the dropdown.
Step 6: Optional security and HTTPS
By default, Open WebUI is accessible on port 3000 and provides its own user system. For internet exposure, put it behind an HTTPS reverse proxy and disable public signups after creating the admin. If you use UFW, allow only necessary ports:
sudo ufw allow 22/tcpsudo ufw allow 80,443/tcpsudo ufw enable
A simple approach is to add a Caddy or Nginx reverse proxy in front of Open WebUI for automatic TLS. Map your domain (e.g., ai.example.com) to the server, then proxy requests to open-webui:8080. Limit administrative access using firewall rules, strong passwords, and, if available, SSO/OIDC in Open WebUI.
Step 7: Updating and backing up
To update images to the latest versions and apply them with minimal downtime:
cd ~/ai-stackdocker compose pulldocker compose up -d
Your models and chat data live in Docker volumes. Back them up regularly:
docker run --rm -v ollama:/data -v $(pwd):/backup alpine tar czf /backup/ollama-vol-$(date +%F).tgz -C /data .docker run --rm -v openwebui:/data -v $(pwd):/backup alpine tar czf /backup/openwebui-vol-$(date +%F).tgz -C /data .
Troubleshooting tips
If GPU is not used on NVIDIA, confirm nvidia-smi works on the host and the container runtime is configured. For AMD, ensure /dev/kfd and /dev/dri exist and the container uses the ollama/ollama:rocm image with the proper device mappings. Model loading failures typically indicate insufficient RAM/VRAM; try a smaller quantization or a smaller model. If the UI cannot see Ollama, verify OLLAMA_API_BASE and that containers can resolve each other by service name.
You are done
You now have a modern, private AI chat stack running on Docker with optional GPU acceleration. Ollama keeps model management simple, and Open WebUI provides a clean, multi-user interface. This setup is easy to maintain, portable across servers, and ready for experimentation with different open-source models and embeddings.
Comments
Post a Comment