Install Ollama and Open WebUI on Ubuntu 24.04 with NVIDIA GPU Acceleration (Step-by-Step)

Overview

Running a local large language model (LLM) is easier than ever thanks to Ollama and Open WebUI. Ollama handles model downloads and inference, while Open WebUI gives you a clean, chat-style interface in your browser. In this tutorial, you'll install both on Ubuntu 24.04 (works on 22.04 too), enable NVIDIA GPU acceleration, and deploy them with Docker Compose. The result is a fast, private AI stack you control.

What You'll Need

- Ubuntu 24.04 or 22.04 (fresh or existing server/desktop).
- An NVIDIA GPU with recent drivers (Turing/RTX or newer recommended).
- Root or sudo access.
- Open ports 3000 (Open WebUI) and 11434 (Ollama) on your firewall if you access remotely.

Step 1: Install NVIDIA Drivers and Verify CUDA

First, update your system and install the recommended NVIDIA driver. On Ubuntu Desktop you can use Additional Drivers, but the CLI route is reliable:

sudo apt update && sudo apt -y upgrade
sudo ubuntu-drivers autoinstall
sudo reboot

After reboot, confirm the GPU is visible:

nvidia-smi

If you see a driver table with your GPU, you're set. If not, re-run the install or check Secure Boot status (disable or enroll the MOK as needed).

Step 2: Install Docker, Compose, and NVIDIA Container Toolkit

Install Docker from the official repository so you get the latest engine and the Compose plugin:

sudo apt-get remove -y docker docker.io containerd runc || true
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Add NVIDIA Container Toolkit so Docker can access the GPU:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify Docker can see the GPU:

docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi

Step 3: Create a Docker Compose File for Ollama and Open WebUI

Create a working directory and a docker-compose.yml:

mkdir -p ~/ollama-openwebui && cd ~/ollama-openwebui
nano docker-compose.yml

Paste the following content, then save:

version: "3.9"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=24h
gpus: all

openwebui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434
- ENABLE_AUTH=True
volumes:
- openwebui:/app/backend/data
depends_on:
- ollama

volumes:
ollama:
openwebui:

Bring everything up:

docker compose up -d

Open WebUI will be available at http://<your-server-ip>:3000 and Ollama's API at http://<your-server-ip>:11434.

Step 4: Pull a Model and Test Inference

Use Ollama to pull an LLM. Llama 3 8B is a good starting point if you have at least ~8–10 GB of free VRAM:

docker exec -it ollama ollama pull llama3:8b

You can test quickly from the CLI:

docker exec -it ollama ollama run llama3:8b

Or open your browser and navigate to Open WebUI (port 3000). Create an account on first visit, select the model, and start chatting. If GPU is being used, you should see activity in:

watch -n 1 nvidia-smi

Step 5: Secure and Maintain the Stack

- Firewall: Allow only needed ports (adjust to your network policy). For local-only use, block remote access to 3000/11434.
- Reverse proxy: For TLS and a friendly domain, put Nginx or Caddy in front of Open WebUI and obtain a Let's Encrypt certificate.
- Updates: Keep images fresh and restart the stack regularly:

docker compose pull
docker compose up -d

Back up volumes so you don't lose chats or downloaded models:

docker run --rm -v ollama:/data -v "$(pwd)":/backup alpine tar czf /backup/ollama-vol.tgz -C /data .
docker run --rm -v openwebui:/data -v "$(pwd)":/backup alpine tar czf /backup/openwebui-vol.tgz -C /data .

Troubleshooting

- No GPU in containers: Confirm the toolkit is active. Check docker info | grep -i nvidia. Re-run sudo nvidia-ctk runtime configure --runtime=docker and restart Docker.
- Model out-of-memory (OOM): Use a smaller model or quantized variant (e.g., llama3:8b-instruct-q4_0). Close other GPU apps. You can also reduce context in Open WebUI settings.
- Slow generation: Ensure you're not falling back to CPU (watch nvidia-smi). Update drivers and Docker images. Use recent CUDA-compatible drivers (550+ often recommended).
- Open WebUI cannot reach Ollama: Check the environment OLLAMA_API_BASE_URL=http://ollama:11434. View logs with docker logs open-webui and docker logs ollama.
- Port conflicts: Change the host ports in docker-compose.yml (e.g., map "127.0.0.1:3000:8080" to bind only locally).

Where Models Are Stored and How to Clean Up

Models live in the Ollama volume (/root/.ollama inside the container). To list installed models:

docker exec -it ollama ollama list

Remove a model you no longer need:

docker exec -it ollama ollama rm llama3:8b

If you ever want to stop and remove the stack:

docker compose down

To reclaim space including volumes (this deletes your models and chat history), run:

docker compose down -v

Wrap-Up

You now have a private, GPU-accelerated LLM environment powered by Ollama and Open WebUI on Ubuntu. With Docker Compose, updates and maintenance are straightforward, and volumes keep your data persistent. From here, try different models (Mistral, Phi-3, Llama 3 Instruct), experiment with prompt templates, and fine-tune performance for your hardware. Enjoy your local AI workstation or server—no cloud required.

LifeBytes Journal

Search This Blog