How to Run Local AI: Deploy Ollama and Open WebUI with NVIDIA GPU on Ubuntu via Docker Compose

Overview

This step-by-step guide shows you how to run a local AI stack on Ubuntu 22.04/24.04 using Docker Compose, Ollama, and Open WebUI with NVIDIA GPU acceleration. Ollama provides a lightweight local API for popular large language models (LLMs) like Llama 3, Mistral, and Qwen, while Open WebUI delivers a clean, user-friendly chat interface. By the end, you will have a secure, updatable setup that serves a local LLM with GPU support for fast responses and offline privacy.

Prerequisites

System: Ubuntu 22.04 or 24.04 with a recent NVIDIA GPU driver installed. Aim for at least 16 GB RAM and sufficient disk space (20–40 GB or more, depending on models). This tutorial uses Docker Engine, Docker Compose plugin, and the NVIDIA Container Toolkit.

Network/Ports: Ollama exposes port 11434 (local only in this guide). Open WebUI will use port 3000. Adjust firewall rules if the server is internet-facing.

1) Install Docker Engine and Compose

Run the following commands to install Docker and the Compose plugin:

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

2) Enable NVIDIA GPU for Containers

Install the NVIDIA Container Toolkit so Docker can pass your GPU to containers. Make sure the host driver is already installed and nvidia-smi works on the host.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Quick GPU test inside a container (optional):

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

3) Create Docker Compose for Ollama + Open WebUI

Create a working folder and a docker-compose.yml file:

mkdir -p ~/local-llm && cd ~/local-llm
nano docker-compose.yml

Paste the following content, then save:

services:
ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ollama:/root/.ollama
    environment:
      - OLLAMA_KEEP_ALIVE=2h
    gpus: all
    ports:
      - "127.0.0.1:11434:11434"

open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      - OLLAMA_API_BASE=http://ollama:11434
      - WEBUI_SECRET_KEY=change_me_long_random
      - ENABLE_SIGNUP=false
      - [email protected]
    volumes:
      - open-webui:/app/backend/data
    ports:
      - "3000:8080"

volumes:
ollama:
open-webui:

Binding Ollama to 127.0.0.1 keeps the model API private. Open WebUI is exposed on port 3000. For a remote VPS, secure it with a firewall or reverse proxy before exposing it.

4) Start the Stack and Pull a Model

Bring everything up:

docker compose up -d

Pull a model into Ollama (example: Llama 3.1 8B):

docker exec -it ollama ollama pull llama3.1:8b

Test the API locally:

curl http://127.0.0.1:11434/api/tags
curl http://127.0.0.1:11434/api/generate -d '{"model":"llama3.1:8b","prompt":"Say hello in one sentence."}'

Open your browser to http://SERVER_IP:3000 (or http://localhost:3000) to access Open WebUI and start chatting with the model you pulled.

5) Secure Access

If this runs on a server, restrict port 3000 to trusted IPs or place it behind HTTPS using a reverse proxy (Caddy, Nginx, or Traefik). For quick private access through SSH tunneling, use:

ssh -L 3000:localhost:3000 user@SERVER_IP

Set ENABLE_SIGNUP=false to prevent public registrations and choose a strong WEBUI_SECRET_KEY. You can also bind Open WebUI to localhost only by changing the port mapping to 127.0.0.1:3000:8080 and serving it via your reverse proxy.

6) Update, Backup, and Maintenance

Update images: keep the stack current with:

docker compose pull && docker compose up -d

Backup models and data: the named volumes hold your models and UI data. You can archive them like this:

mkdir -p ~/local-llm/backups && cd ~/local-llm
docker run --rm -v ollama:/data -v "$PWD/backups":/backup alpine sh -c 'tar czf /backup/ollama-vol.tar.gz -C /data .'
docker run --rm -v open-webui:/data -v "$PWD/backups":/backup alpine sh -c 'tar czf /backup/openwebui-vol.tar.gz -C /data .'

Stop/Start: docker compose down stops containers but keeps volumes. Use docker compose down -v to remove volumes as well (this deletes downloaded models and chat history).

7) Troubleshooting

GPU not detected: run nvidia-smi on the host; if it fails, reinstall the NVIDIA driver. Verify the container sees your GPU with docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi. Ensure you ran sudo nvidia-ctk runtime configure --runtime=docker and restarted Docker.

Slow or out-of-memory: choose a smaller or quantized model (for example, llama3.1:8b or mistral:7b). Large models need more VRAM. You can also run CPU-only by removing GPU options, but performance will drop.

Port conflicts: change the host ports in docker-compose.yml (e.g., "127.0.0.1:11435:11434" and "3001:8080"), then docker compose up -d.

Cannot access WebUI: confirm the container is healthy with docker ps and check logs via docker logs open-webui. If remote, verify firewall rules allow your IP to reach port 3000 or use SSH tunneling.

What You Achieved

You now have a modern, GPU-accelerated local LLM platform on Ubuntu using Docker Compose. Ollama handles model management and API requests, while Open WebUI provides a polished chat experience. This stack is easy to update, simple to back up, and private by default—ideal for development, helpdesk knowledge assistants, and secure, offline AI workflows.

LifeBytes Journal

Search This Blog