How to Deploy Ollama and Open WebUI with Docker (CPU/NVIDIA/AMD) on Ubuntu 22.04/24.04

Overview

This tutorial shows how to deploy a private, local AI stack with Ollama (model runtime) and Open WebUI (chat interface) using Docker on Ubuntu 22.04/24.04. You will learn how to run it on CPU, enable NVIDIA or AMD/ROCm GPU acceleration, secure the web interface, and keep everything up to date. The result is a fast, reliable, and low-maintenance setup suitable for labs, developers, and small teams.

Prerequisites

You need an Ubuntu 22.04 or 24.04 system with sudo access, 16 GB+ RAM (more is better), 20 GB+ free disk space, and a stable internet connection. For GPU acceleration, use a recent NVIDIA GPU with official drivers or a compatible AMD GPU with ROCm-capable kernel and hardware. Ensure ports 11434 (Ollama) and 3000 (Open WebUI) are free. If you plan to expose the service on the internet, prepare a domain name and DNS A/AAAA record pointing to the server.

Step 1: Install Docker Engine and Compose

sudo apt update && sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Step 2: GPU Preparation (optional but recommended)

NVIDIA: Install the proprietary driver and the NVIDIA Container Toolkit so Docker can access your GPU.

sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers install
sudo reboot

distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

AMD (ROCm): Ensure your GPU is ROCm-capable and the kfd and dri devices are present. Give your user access to the required groups.

sudo usermod -aG render,video $USER
sudo reboot

Step 3: Create a Docker Compose file

Create a working directory like ~/ai-stack, then create docker-compose.yml. The following example starts Ollama and Open WebUI with volumes for persistence. It includes variants for CPU, NVIDIA, and AMD. Only keep one GPU option at a time.

docker-compose.yml (CPU-only by default):

services:
ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
     - "11434:11434"
    volumes:
     - ollama:/root/.ollama
    environment:
     - OLLAMA_KEEP_ALIVE=24h
     - OLLAMA_NUM_THREADS=8
# For NVIDIA GPU (uncomment the next 4 lines and comment the AMD lines below):
#   runtime: nvidia
#   environment:
#    - NVIDIA_VISIBLE_DEVICES=all
#    - NVIDIA_DRIVER_CAPABILITIES=compute,utility
# For AMD ROCm GPU (use the ROCm image and device mappings):
#   image: ollama/ollama:rocm
#   devices:
#    - /dev/kfd
#    - /dev/dri
#   group_add:
#    - "video"
#    - "render"
open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    depends_on:
     - ollama
    restart: unless-stopped
    environment:
     - OLLAMA_API_BASE=http://ollama:11434
    ports:
     - "3000:8080"
    volumes:
     - openwebui:/app/backend/data
volumes:
ollama:
openwebui:

Step 4: Start the stack

docker compose up -d

Check containers and logs to confirm both services are healthy.

docker ps
docker logs -f ollama
docker logs -f open-webui

Step 5: Pull a model and test

Use Ollama to download a model. Popular choices are llama3.1:8b, llama3.1:70b (needs more VRAM), mistral, or qwen2. Start with an 8B or 7B model to validate your setup.

docker exec -it ollama ollama pull llama3.1:8b
curl http://localhost:11434/api/tags

Open a browser to http://<server-ip>:3000. The first user that signs up in Open WebUI becomes the admin. In Settings, point the Ollama endpoint to http://ollama:11434 (it is already set via OLLAMA_API_BASE). Create a new chat and pick your model from the dropdown.

Step 6: Optional security and HTTPS

By default, Open WebUI is accessible on port 3000 and provides its own user system. For internet exposure, put it behind an HTTPS reverse proxy and disable public signups after creating the admin. If you use UFW, allow only necessary ports:

sudo ufw allow 22/tcp
sudo ufw allow 80,443/tcp
sudo ufw enable

A simple approach is to add a Caddy or Nginx reverse proxy in front of Open WebUI for automatic TLS. Map your domain (e.g., ai.example.com) to the server, then proxy requests to open-webui:8080. Limit administrative access using firewall rules, strong passwords, and, if available, SSO/OIDC in Open WebUI.

Step 7: Updating and backing up

To update images to the latest versions and apply them with minimal downtime:

cd ~/ai-stack
docker compose pull
docker compose up -d

Your models and chat data live in Docker volumes. Back them up regularly:

docker run --rm -v ollama:/data -v $(pwd):/backup alpine tar czf /backup/ollama-vol-$(date +%F).tgz -C /data .
docker run --rm -v openwebui:/data -v $(pwd):/backup alpine tar czf /backup/openwebui-vol-$(date +%F).tgz -C /data .

Troubleshooting tips

If GPU is not used on NVIDIA, confirm nvidia-smi works on the host and the container runtime is configured. For AMD, ensure /dev/kfd and /dev/dri exist and the container uses the ollama/ollama:rocm image with the proper device mappings. Model loading failures typically indicate insufficient RAM/VRAM; try a smaller quantization or a smaller model. If the UI cannot see Ollama, verify OLLAMA_API_BASE and that containers can resolve each other by service name.

You are done

You now have a modern, private AI chat stack running on Docker with optional GPU acceleration. Ollama keeps model management simple, and Open WebUI provides a clean, multi-user interface. This setup is easy to maintain, portable across servers, and ready for experimentation with different open-source models and embeddings.

LifeBytes Journal

Search This Blog