Overview
This tutorial shows how to deploy a local Large Language Model (LLM) stack on Ubuntu using Docker, with hardware acceleration for NVIDIA or AMD GPUs. We will combine Ollama (model runtime and manager) with Open WebUI (a fast, modern web interface) so you can chat with models like Llama 3.1 or Mistral on your own machine. The steps apply to Ubuntu 22.04/24.04, and are suitable for homelabs and small teams.
Prerequisites
- Ubuntu server or desktop with internet access
- A recent CPU; for GPU acceleration: an NVIDIA GPU with recent drivers, or an AMD GPU with ROCm support
- Sudo privileges and ports 11434 (Ollama) and 3000 (Open WebUI) available
Install Docker Engine
If Docker is not installed, use the official repository to get the latest stable version and the Compose plugin.
sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version
Enable GPU Acceleration (NVIDIA)
Install the NVIDIA Container Toolkit so containers can access the GPU. Ensure the proprietary GPU driver is installed (e.g., 535+). Then run:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
nvidia-smi
If nvidia-smi works on the host, the GPU will be available inside the containers when requested.
Enable GPU Acceleration (AMD ROCm)
AMD support relies on ROCm. On supported GPUs and kernels, install ROCm drivers (refer to AMD documentation for your GPU). Start with:
sudo apt update
# Example meta-package (adjust to your distro and GPU generation)
sudo apt install -y rocm-hip-runtime5.7
/opt/rocm/bin/rocminfo
For Docker, we will pass the ROCm devices into Ollama’s container. Note that model availability and performance vary by GPU generation.
Create the Docker Compose file
We will run two services: ollama and open-webui. Create a project directory and a Compose file:
mkdir -p ~/ollama-openwebui
cd ~/ollama-openwebui
nano docker-compose.yml
Paste the following Compose configuration. Choose ONE of the GPU sections (NVIDIA or AMD). If you don’t have a GPU, omit the device configurations to run on CPU.
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
# NVIDIA GPU (uncomment for NVIDIA)
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: all
# capabilities: [gpu]
# AMD ROCm (uncomment for AMD)
# devices:
# - "/dev/kfd:/dev/kfd"
# - "/dev/dri:/dev/dri"
# environment:
# - HSA_OVERRIDE_GFX_VERSION=11.0.0
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
restart: unless-stopped
depends_on:
- ollama
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- openwebui:/app/backend/data
volumes:
ollama:
openwebui:
Start the stack
Bring the services up in the background, then confirm they’re healthy.
docker compose up -d
docker compose ps
Open your browser and visit http://SERVER_IP:3000. The first login creates an admin account. Open WebUI will auto-connect to Ollama.
Download a model in Ollama
You can pull a model via the Open WebUI interface or the CLI. For example, to pull Llama 3.1 and test it:
docker exec -it ollama ollama pull llama3.1
docker exec -it ollama ollama run llama3.1
In Open WebUI, select the model from the top bar and start chatting. If GPU is configured correctly, inference will run on the GPU.
Securing access
By default, Open WebUI is exposed on port 3000 without TLS. For internet access, put it behind a reverse proxy like Nginx or Caddy with HTTPS, or use a VPN (e.g., Tailscale/WireGuard). On Ubuntu, restrict the firewall to your network:
sudo ufw allow from 192.168.0.0/24 to any port 3000 proto tcp
sudo ufw allow from 192.168.0.0/24 to any port 11434 proto tcp
Updating and backups
To update, pull the latest images and recreate containers without losing data (volumes keep models and UI data):
docker compose pull
docker compose up -d
For backups, snapshot the Docker volumes or copy them to external storage. On a single host, you can export and re-import volumes with standard tar workflows.
Troubleshooting
- GPU not detected in container (NVIDIA): ensure the NVIDIA driver matches the toolkit; run docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi. If it fails, recheck the toolkit setup and restart Docker.
- GPU not detected (AMD): verify rocminfo and clinfo on the host. Make sure /dev/kfd and /dev/dri are mapped and the user has permissions. Some older GPUs are unsupported by modern ROCm.
- Slow inference: use a smaller model (e.g., 7B), increase context/kv-caching wisely, and confirm the container is using the GPU. Consider enabling hugepages and ensuring adequate VRAM.
- Port conflicts: change the mapped ports in docker-compose.yml or stop services occupying them.
Cleanup
To stop the stack, run docker compose down. To remove images and volumes too (irreversible), run docker compose down --volumes --rmi all.
You now have a private, GPU-accelerated LLM environment running Ollama with Open WebUI on Ubuntu. This setup is flexible, easy to upgrade, and ideal for secure, local AI experimentation and productivity.
3.
Comments
Post a Comment