Run Local LLMs with GPU: Deploy Ollama + Open WebUI on Docker (Ubuntu 24.04)

Overview

Running large language models (LLMs) locally is easier and faster than ever with Ollama and Open WebUI. In this tutorial, you will deploy both on Docker with optional NVIDIA GPU acceleration on Ubuntu 22.04/24.04. We will cover prerequisites, the Docker Compose file, model download, security tips, updates, and troubleshooting. This guide uses simple language and SEO-friendly steps so you can get a private AI assistant running in minutes.

Prerequisites

Before you start, ensure you have: (1) Ubuntu 22.04 or 24.04 with sudo access, (2) Docker Engine and Docker Compose v2, (3) an NVIDIA GPU with proprietary drivers installed (optional but recommended), (4) at least 16 GB RAM and 20+ GB free disk for models, and (5) an open firewall port 3000 (Open WebUI) and 11434 (Ollama) if accessed remotely.

Step 0: Verify NVIDIA drivers (GPU users)

If you plan to use GPU acceleration, install the latest NVIDIA driver and verify it works. Run: nvidia-smi. You should see your GPU listed with a driver version. If the command is missing, install the driver using: sudo ubuntu-drivers autoinstall, reboot, then test nvidia-smi again.

Step 1: Install Docker Engine and Compose

Install Docker from the official repository for the best compatibility. Example quick setup:
1) sudo apt update && sudo apt install -y ca-certificates curl gnupg
2) sudo install -m 0755 -d /etc/apt/keyrings
3) curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
4) echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list
5) sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
6) Optional: sudo usermod -aG docker $USER and re-login to run Docker without sudo.

Step 2: Install NVIDIA Container Toolkit (GPU users)

This step lets Docker containers access your GPU. Run:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Test GPU inside Docker: docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi. If you see your GPU, you are ready.

Step 3: Create the Docker Compose file

Create a project directory, for example: mkdir -p ~/ollama-openwebui && cd ~/ollama-openwebui. Then create docker-compose.yml with the following content. This setup persists models and WebUI data, exposes ports, and enables GPU if available.

version: "3.8"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
    gpus: "all"

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      - OLLAMA_API_BASE_URL=http://ollama:11434
    ports:
      - "3000:8080"
    volumes:
      - openwebui:/app/backend/data

volumes:
  ollama:
  openwebui:

Notes: (a) The line gpus: "all" enables acceleration when the NVIDIA toolkit is present; Docker Compose will map this to --gpus all. (b) If you do not have a GPU, simply leave the file as-is; the Ollama image will run on CPU automatically. (c) Ports 11434 and 3000 can be changed to fit your environment or firewall rules.

Step 4: Start the stack

From the project directory, run docker compose up -d. Wait a few seconds, then confirm both containers are healthy with docker compose ps and view logs with docker logs -f ollama or docker logs -f open-webui.

Step 5: Pull a model and test

Ollama downloads models on demand. You can preload a model via the container: docker exec -it ollama ollama pull llama3.2:3b. Smaller models (2B–7B parameters) are faster and use less RAM; larger models offer higher quality but need more resources. After the pull finishes, open your browser to http://<server-ip>:3000, create your first account, and in Open WebUI select the model (for example, llama3.2:3b) to start chatting.

Step 6: Secure access

By default, Open WebUI allows signups. After creating your admin account, you can restrict access. Edit docker-compose.yml under the open-webui service and add: ENABLE_SIGNUP=false in the environment section, then run docker compose up -d again. For internet exposure, put Open WebUI behind a reverse proxy (Nginx, Caddy, or Traefik), enable HTTPS with Let’s Encrypt, and consider firewalling or a zero-trust tunnel (Cloudflare/Tailscale) for extra protection.

Maintenance and updates

To update containers without deleting your data: docker compose pull followed by docker compose up -d. To update or remove models: docker exec -it ollama ollama pull <model:tag> and docker exec -it ollama ollama rm <model:tag>. To stop the stack: docker compose down. To remove everything including volumes, add -v, but this deletes models and WebUI data.

Troubleshooting

If Open WebUI cannot see Ollama, confirm the internal URL is correct: OLLAMA_API_BASE_URL=http://ollama:11434. Check container connectivity with docker exec open-webui wget -qO- http://ollama:11434/api/tags.

If GPU is not detected, verify the NVIDIA toolkit: run docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi. If that works but the Ollama container is still CPU-only, confirm the gpus: "all" line is present, Docker was restarted after nvidia-ctk runtime configure, and the driver version is compatible with your GPU.

If downloads are slow or interrupted, restart the Ollama container and try again. You can set an alternative registry mirror for Docker to improve pull speeds. Also verify free disk space with df -h because model files can be large.

What you achieved

You now have a private, local AI stack powered by Ollama and Open WebUI, running in Docker with persistent storage and optional GPU acceleration. This architecture is easy to back up, trivial to update, and flexible enough to host multiple models. Add a reverse proxy for TLS, schedule backups of the ollama and openwebui volumes, and explore advanced features like embedding, RAG, and function calling as you grow your setup.

Comments