Deploy Ollama and Open WebUI with NVIDIA GPU on Ubuntu using Docker (OpenAI-Compatible Local LLM)

Overview

This step-by-step guide shows how to deploy Ollama and Open WebUI on Ubuntu with NVIDIA GPU acceleration using Docker. You will run large language models locally, manage them in a user-friendly web interface, and expose an OpenAI-compatible API for your apps. The tutorial is designed for Ubuntu 22.04 or 24.04 and focuses on a secure, reproducible, and easily maintainable setup.

What You Will Build

You will run two containers: Ollama (the local LLM runtime and API) and Open WebUI (a modern web UI for chat, prompts, and model management). The stack runs on Docker with NVIDIA GPU acceleration via the NVIDIA Container Toolkit, giving you faster inference and the ability to run larger models locally.

Prerequisites

- Ubuntu 22.04 or 24.04 with sudo access
- An NVIDIA GPU with recent drivers (Turing or newer recommended)
- At least 16 GB RAM for medium models; more for large models
- Internet access to pull Docker images and models

Step 1 — Install NVIDIA Drivers and Container Toolkit

If you have not installed NVIDIA drivers, use Ubuntu’s recommended driver installer:

sudo apt update && sudo apt install -y ubuntu-drivers-common sudo ubuntu-drivers autoinstall sudo reboot

After reboot, verify the GPU:

nvidia-smi

Install the NVIDIA Container Toolkit so Docker can access the GPU:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list > /dev/null sudo apt update && sudo apt install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker

Step 2 — Install Docker Engine and Compose Plugin

If Docker is not installed, install it from the official repository:

sudo apt update && sudo apt install -y ca-certificates curl gnupg sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt update sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Verify:

docker --version && docker compose version

Step 3 — Start Ollama with GPU Acceleration

Create a dedicated Docker network and volume, then run Ollama:

docker network create ai || true docker volume create ollama docker run -d --name ollama --restart unless-stopped --gpus all \ -p 11434:11434 -v ollama:/root/.ollama \ -e OLLAMA_ORIGINS="http://localhost:3000,http://127.0.0.1:3000" \ --network ai ollama/ollama:latest

Confirm it is running:

docker logs -f ollama

Step 4 — Launch Open WebUI

Run the web UI and point it at the Ollama container:

docker run -d --name open-webui --restart unless-stopped \ -p 3000:8080 --network ai \ -e OLLAMA_API_BASE_URL=http://ollama:11434 \ open-webui/open-webui:latest

Open your browser to http://localhost:3000 (or the server’s IP:3000). Create an admin account and adjust settings as needed.

Step 5 — Pull a Model and Test

Use the Ollama CLI inside the container to download a model, for example Llama 3.1 8B:

docker exec -it ollama ollama pull llama3.1:8b

Generate text via the API to verify everything works:

curl http://localhost:11434/api/generate -d '{"model":"llama3.1:8b","prompt":"Write a haiku about GPUs."}'

In Open WebUI, choose the model from the dropdown and start chatting. You can download multiple models and switch between them.

Step 6 — Use the OpenAI-Compatible API

Ollama exposes an OpenAI-style API. Point your clients to the local endpoint and use any placeholder key:

export OPENAI_API_BASE=http://localhost:11434/v1 export OPENAI_API_KEY=not-needed

Python example with the OpenAI SDK (chat completions):

pip install openai python - <<'PY' from openai import OpenAI client = OpenAI(base_url="http://localhost:11434/v1", api_key="local") resp = client.chat.completions.create(model="llama3.1:8b", messages=[{"role":"user","content":"Explain vector databases in one paragraph."}]) print(resp.choices[0].message.content) PY

Troubleshooting

- If nvidia-smi fails inside containers, re-run sudo nvidia-ctk runtime configure --runtime=docker and restart Docker.
- If the UI cannot see models, confirm OLLAMA_API_BASE_URL is correct and both containers are on the same network.
- For model download failures, check disk space and retry: docker exec -it ollama ollama pull MODEL_NAME.
- If ports are in use, change the host ports (for example, -p 11435:11434 and -p 3001:8080).

Security and Best Practices

- Do not expose port 11434 to the internet without a reverse proxy and auth; bind to localhost or your private network only.
- In Open WebUI, enable authentication and restrict sign-ups in the admin settings.
- Keep images updated: docker pull ollama/ollama:latest && docker pull open-webui/open-webui:latest, then recreate containers.
- Backup volumes regularly: docker run --rm -v ollama:/data -v $(pwd):/backup busybox tar czf /backup/ollama-backup.tgz -C / data.

Optional: docker compose

Prefer a single-file deployment? Create compose.yaml:

services: ollama: image: ollama/ollama:latest restart: unless-stopped ports: ["11434:11434"] volumes: ["ollama:/root/.ollama"] deploy: {} environment: - OLLAMA_ORIGINS=http://localhost:3000,http://127.0.0.1:3000 runtime: nvidia open-webui: image: open-webui/open-webui:latest restart: unless-stopped ports: ["3000:8080"] environment: - OLLAMA_API_BASE_URL=http://ollama:11434 depends_on: ["ollama"] volumes: ollama:

Start with docker compose up -d. This file is easy to version-control and redeploy on another machine.

Wrap-Up

You now have a fast, private, and flexible local AI stack running on Ubuntu with GPU support. Ollama handles model execution and exposes an OpenAI-compatible API; Open WebUI provides a polished interface for daily use. With Docker, upgrades and backups are simple, and you can iterate quickly as new models and features arrive.

LifeBytes Journal

Search This Blog