Run a Local LLM with GPU Acceleration: Deploy Ollama + Open WebUI on Ubuntu via Docker

Overview

This tutorial shows how to deploy a local Large Language Model (LLM) stack on Ubuntu using Docker, with hardware acceleration for NVIDIA or AMD GPUs. We will combine Ollama (model runtime and manager) with Open WebUI (a fast, modern web interface) so you can chat with models like Llama 3.1 or Mistral on your own machine. The steps apply to Ubuntu 22.04/24.04, and are suitable for homelabs and small teams.

Prerequisites

- Ubuntu server or desktop with internet access

- A recent CPU; for GPU acceleration: an NVIDIA GPU with recent drivers, or an AMD GPU with ROCm support

- Sudo privileges and ports 11434 (Ollama) and 3000 (Open WebUI) available

Install Docker Engine

If Docker is not installed, use the official repository to get the latest stable version and the Compose plugin.

sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version

Enable GPU Acceleration (NVIDIA)

Install the NVIDIA Container Toolkit so containers can access the GPU. Ensure the proprietary GPU driver is installed (e.g., 535+). Then run:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
nvidia-smi

If nvidia-smi works on the host, the GPU will be available inside the containers when requested.

Enable GPU Acceleration (AMD ROCm)

AMD support relies on ROCm. On supported GPUs and kernels, install ROCm drivers (refer to AMD documentation for your GPU). Start with:

sudo apt update
# Example meta-package (adjust to your distro and GPU generation)
sudo apt install -y rocm-hip-runtime5.7
/opt/rocm/bin/rocminfo

For Docker, we will pass the ROCm devices into Ollama’s container. Note that model availability and performance vary by GPU generation.

Create the Docker Compose file

We will run two services: ollama and open-webui. Create a project directory and a Compose file:

mkdir -p ~/ollama-openwebui
cd ~/ollama-openwebui
nano docker-compose.yml

Paste the following Compose configuration. Choose ONE of the GPU sections (NVIDIA or AMD). If you don’t have a GPU, omit the device configurations to run on CPU.

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    # NVIDIA GPU (uncomment for NVIDIA)
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]
    # AMD ROCm (uncomment for AMD)
    # devices:
    #   - "/dev/kfd:/dev/kfd"
    #   - "/dev/dri:/dev/dri"
    # environment:
    #   - HSA_OVERRIDE_GFX_VERSION=11.0.0

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - openwebui:/app/backend/data

volumes:
  ollama:
  openwebui:

Start the stack

Bring the services up in the background, then confirm they’re healthy.

docker compose up -d
docker compose ps

Open your browser and visit http://SERVER_IP:3000. The first login creates an admin account. Open WebUI will auto-connect to Ollama.

Download a model in Ollama

You can pull a model via the Open WebUI interface or the CLI. For example, to pull Llama 3.1 and test it:

docker exec -it ollama ollama pull llama3.1
docker exec -it ollama ollama run llama3.1

In Open WebUI, select the model from the top bar and start chatting. If GPU is configured correctly, inference will run on the GPU.

Securing access

By default, Open WebUI is exposed on port 3000 without TLS. For internet access, put it behind a reverse proxy like Nginx or Caddy with HTTPS, or use a VPN (e.g., Tailscale/WireGuard). On Ubuntu, restrict the firewall to your network:

sudo ufw allow from 192.168.0.0/24 to any port 3000 proto tcp
sudo ufw allow from 192.168.0.0/24 to any port 11434 proto tcp

Updating and backups

To update, pull the latest images and recreate containers without losing data (volumes keep models and UI data):

docker compose pull
docker compose up -d

For backups, snapshot the Docker volumes or copy them to external storage. On a single host, you can export and re-import volumes with standard tar workflows.

Troubleshooting

- GPU not detected in container (NVIDIA): ensure the NVIDIA driver matches the toolkit; run docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi. If it fails, recheck the toolkit setup and restart Docker.

- GPU not detected (AMD): verify rocminfo and clinfo on the host. Make sure /dev/kfd and /dev/dri are mapped and the user has permissions. Some older GPUs are unsupported by modern ROCm.

- Slow inference: use a smaller model (e.g., 7B), increase context/kv-caching wisely, and confirm the container is using the GPU. Consider enabling hugepages and ensuring adequate VRAM.

- Port conflicts: change the mapped ports in docker-compose.yml or stop services occupying them.

Cleanup

To stop the stack, run docker compose down. To remove images and volumes too (irreversible), run docker compose down --volumes --rmi all.

You now have a private, GPU-accelerated LLM environment running Ollama with Open WebUI on Ubuntu. This setup is flexible, easy to upgrade, and ideal for secure, local AI experimentation and productivity.

LifeBytes Journal

Search This Blog