Overview
This step-by-step guide shows how to deploy Ollama (for running local LLMs) together with Open WebUI (a modern, browser-based interface) using Docker on Ubuntu. We will enable NVIDIA GPU acceleration for faster inference, set up persistent storage, and cover useful operational tips. By the end, you will have a local, privacy-friendly AI stack that you can run offline and manage easily.
Prerequisites
- Ubuntu 22.04 or 24.04 with sudo access
- An NVIDIA GPU with recent drivers (535+ recommended)
- At least 16 GB RAM for larger models; ensure enough disk space for model files
- Internet access to pull Docker images and models
1) Install Docker and Docker Compose
Install Docker from the official repository so you get the latest engine and the Compose plugin:
sudo apt update && sudo apt install -y ca-certificates curl gnupgsudo install -m 0755 -d /etc/apt/keyringscurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpgecho "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt updatesudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginsudo usermod -aG docker $USER && newgrp docker
2) Install NVIDIA Drivers and Container Toolkit
Make sure the proprietary NVIDIA driver is installed and working (verify with nvidia-smi). Then install the NVIDIA Container Toolkit so Docker can access the GPU:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && \curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpgcurl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt update && sudo apt install -y nvidia-container-toolkitsudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
Test GPU visibility inside containers:
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
3) Create the Docker Compose file
Create a project directory and a docker-compose.yml with two services: ollama and open-webui. This configuration exposes Open WebUI on port 8080 and mounts persistent volumes for both services.
mkdir -p ~/ollama-stack && cd ~/ollama-stacknano docker-compose.yml
services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped ports: - "11434:11434" volumes: - ollama-data:/root/.ollama deploy: resources: reservations: devices: - capabilities: ["gpu"] environment: - OLLAMA_MAX_LOADED_MODELS=2 runtime: nvidia open-webui: image: ghcr.io/open-webui/open-webui:latest container_name: open-webui depends_on: - ollama restart: unless-stopped ports: - "8080:8080" environment: - OLLAMA_BASE_URL=http://ollama:11434 - WEBUI_NAME=Local LLM Console volumes: - openwebui-data:/app/backend/datavolumes: ollama-data: openwebui-data:
4) Launch the stack
Start both containers with Docker Compose:
docker compose up -d
Open WebUI will be available at http://<your-server-ip>:8080. The first visit prompts you to create an admin user.
5) Pull and run a model
You can pull models using the Ollama CLI within the container. For example, to fetch and run a popular 7B model:
docker exec -it ollama ollama pull llama3docker exec -it ollama ollama run llama3
In Open WebUI, select the same model name (e.g., llama3) from the model dropdown. You can add system prompts, adjust temperature, or manage multiple models.
6) Persist, update, and back up
- Persistent data: All models live in the ollama-data volume, and Open WebUI settings live in openwebui-data.
- Update images: docker compose pull && docker compose up -d
- Back up volumes: docker run --rm -v ollama-data:/data -v $(pwd):/backup busybox tar czf /backup/ollama-data.tgz /data
7) Optional: restrict access and add HTTPS
For a single-user setup, it is safer to bind Open WebUI to localhost and use an SSH tunnel. Change the Open WebUI service port mapping from "8080:8080" to "127.0.0.1:8080:8080" and restart. Then access it with ssh -L 8080:localhost:8080 user@server.
If you need public access with HTTPS, place a reverse proxy (e.g., Caddy or Nginx) in front. With Caddy, a minimal site block looks like:
ai.example.com { reverse_proxy 127.0.0.1:8080}
Point DNS to your server and Caddy will fetch certificates automatically. Add HTTP auth or an allowlist if the instance is exposed to the internet.
8) Troubleshooting
Docker container cannot see the GPU: Confirm the host runs nvidia-smi successfully. Reinstall the NVIDIA Container Toolkit, run sudo nvidia-ctk runtime configure --runtime=docker, then sudo systemctl restart docker. Also ensure your compose file sets runtime: nvidia or uses --gpus all if running directly with docker run.
Out of memory or slow performance: Start with smaller models (e.g., 3B–7B). For better throughput, set num_ctx and num_gpu options when creating models with Ollama, and close other GPU-heavy apps.
Port conflicts: Change host port mappings in the compose file, e.g., "8081:8080" for Open WebUI or "11435:11434" for Ollama, and redeploy.
9) Notes for AMD/Apple users
This tutorial targets NVIDIA on Linux. For AMD GPUs on Linux, investigate ROCm builds of Ollama and ensure your GPU is supported by ROCm. On Apple Silicon, you can run both services natively or with Docker Desktop; GPU acceleration leverages Apple’s Metal backend automatically in the native build.
Wrap-up
You’ve deployed a modern local AI stack with Ollama and Open WebUI using Docker and enabled NVIDIA GPU acceleration on Ubuntu. With persistent storage, easy upgrades, and optional HTTPS, this setup is production-friendly for personal research, prototyping, and helpdesk automations. Experiment with different models, tweak performance flags, and keep your system updated for the best results.
Comments
Post a Comment