How to Run Ollama and Open WebUI on Ubuntu 24.04 with NVIDIA GPU (Docker Guide)

Overview

This step-by-step guide shows you how to deploy Ollama and Open WebUI on Ubuntu 24.04 with NVIDIA GPU acceleration using Docker. With this setup, you can run modern large language models (LLMs) locally, manage them from a clean web interface, and take full advantage of your GPU for high performance. The process covers NVIDIA drivers, Docker, the NVIDIA Container Toolkit, and secure, persistent containers that survive reboots.

What You Will Need

You need a 64-bit Ubuntu 24.04 host with an NVIDIA GPU (Turing or newer recommended), Internet access, a user with sudo rights, and at least 20 GB of free disk space for models. If you are working on a remote server, make sure port 3000 (for Open WebUI) and 11434 (for Ollama) are reachable or routed through a reverse proxy.

1) Install NVIDIA Drivers

First, install the official NVIDIA driver so CUDA can talk to your GPU. Run: sudo ubuntu-drivers autoinstall. When it finishes, reboot with sudo reboot. After the reboot, verify the GPU is visible: nvidia-smi. You should see your GPU name and driver version. If you do not, confirm Secure Boot is disabled or enroll the driver MOK accordingly, then repeat the check.

2) Install Docker Engine on Ubuntu 24.04

Set up Docker from the official repository for best stability and features. Run: sudo apt update && sudo apt install -y ca-certificates curl gnupg. Add Docker’s key and repo: sudo install -m 0755 -d /etc/apt/keyrings, curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg, echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu noble stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null. Then install: sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin. To run Docker without sudo: sudo usermod -aG docker $USER then newgrp docker.

3) Enable GPU Access in Containers (NVIDIA Container Toolkit)

Install the NVIDIA Container Toolkit so Docker can pass your GPU into containers. Add the key and repo: curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg, curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list. Install and configure: sudo apt update && sudo apt install -y nvidia-container-toolkit, sudo nvidia-ctk runtime configure --runtime=docker, sudo systemctl restart docker. Test GPU passthrough: docker run --rm --gpus all nvidia/cuda:12.6.2-base-ubuntu22.04 nvidia-smi. You should see your GPU listed inside the container.

4) Create a Dedicated Network for AI Services

Create a user-defined Docker network so containers can discover each other cleanly: docker network create ai. This network isolates traffic and lets Open WebUI talk to the Ollama container by name.

5) Run the Ollama Container with GPU Support

Start Ollama and persist its model data in a Docker volume. Run: docker run -d --name ollama --gpus all --restart unless-stopped -p 11434:11434 -v ollama:/root/.ollama --network ai ollama/ollama:latest. The container exposes the Ollama API on port 11434. Check logs with docker logs -f ollama to ensure the server starts without errors.

6) Pull a Model (Llama 3.1 example)

Use Ollama’s CLI inside the container to download a model. For a great balance of speed and quality on consumer GPUs, try an 8B model: docker exec -it ollama ollama pull llama3.1:8b. If you have a smaller GPU (e.g., 6–8 GB VRAM), try a quantized variant like llama3.1:8b-instruct-q4_K_M. You can list models with docker exec -it ollama ollama list.

7) Deploy Open WebUI and Connect to Ollama

Open WebUI provides a friendly interface to chat with models, manage prompts, and configure settings. Start it with: docker run -d --name open-webui --restart unless-stopped -p 3000:8080 -e OLLAMA_API_BASE_URL=http://ollama:11434 -v openwebui:/app/backend/data --network ai ghcr.io/open-webui/open-webui:latest. Open http://<your_server_ip>:3000 in a browser, create your first user (the first account becomes admin), and pick the model you pulled in the previous step. You can now chat with the LLM directly from your browser.

8) Optional: Secure Access with HTTPS

For Internet-facing servers, place a reverse proxy with TLS in front of Open WebUI. A simple approach is Caddy or Nginx Proxy Manager. Point your domain’s DNS to the server, terminate HTTPS on the proxy, and forward to localhost:3000. If you already use Traefik or Nginx, add routes with Let’s Encrypt certificates and restrict access using basic auth or OAuth.

Maintenance and Updates

To update Ollama or Open WebUI, pull new images and recreate containers. Run: docker pull ollama/ollama:latest and docker pull ghcr.io/open-webui/open-webui:latest, then docker stop ollama open-webui and docker rm ollama open-webui. Start them again using the same docker run commands; your data persists in the volumes ollama and openwebui. To back up models and settings, archive the volumes: sudo tar -czf ollama-vol.tgz -C /var/lib/docker/volumes/ollama/_data . and sudo tar -czf openwebui-vol.tgz -C /var/lib/docker/volumes/openwebui/_data ..

Troubleshooting

If the GPU is not detected inside containers, confirm the host driver works with nvidia-smi. Then verify the runtime is configured: docker info | grep -i nvidia. If missing, re-run sudo nvidia-ctk runtime configure --runtime=docker and sudo systemctl restart docker. For permission errors when running Docker, add your user to the docker group as shown above. If downloads are slow or models fail due to VRAM limits, choose smaller or quantized models (e.g., q4_K_M or q5_K_M).

What You Get

After following these steps, you have a modern, GPU-accelerated local AI stack. Ollama handles efficient model runtimes, and Open WebUI gives you a clean chat interface, prompt management, and multi-model control. Because everything runs in Docker with persistent volumes, updates and backups are easy, and you can scale this setup on a workstation or a headless server with minimal changes.

LifeBytes Journal

Search This Blog