How to Run a Local AI Chat with Ollama and Open WebUI on Ubuntu 24.04 (GPU-Ready)

Overview

This tutorial shows how to deploy a private, fast, and GPU-accelerated local AI chat using Ollama and Open WebUI on Ubuntu 24.04. Ollama manages large language models (LLMs) such as Llama 3, while Open WebUI provides a friendly web interface for chatting, prompt management, and basic workflow tools. We will use Docker to keep the setup clean and reproducible, with steps for both CPU-only and NVIDIA GPU acceleration.

Prerequisites

OS: Ubuntu 24.04 (also works on 22.04). Access: sudo-enabled user. Hardware: 8 GB RAM minimum for small models; NVIDIA GPU (optional) with at least 8 GB VRAM for faster inference. Network: Internet access to pull images and models.

Step 1 — Install Docker and basic tools

Update the system and install Docker using the convenience script. Then add your user to the docker group so you can run containers without sudo.
sudo apt update && sudo apt -y upgrade
sudo apt -y install curl ca-certificates gnupg lsb-release
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER && newgrp docker

Step 2 — (Optional) Enable NVIDIA GPU acceleration

If you have an NVIDIA GPU, install the recommended driver, then the NVIDIA Container Toolkit so Docker can use your GPU.
Install driver:
sudo ubuntu-drivers autoinstall && sudo reboot
After reboot, verify:
nvidia-smi

Install NVIDIA Container Toolkit:
sudo bash -c 'curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg'
sudo bash -c 'curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed "s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#" > /etc/apt/sources.list.d/nvidia-container-toolkit.list'
sudo apt update && sudo apt -y install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Step 3 — Create a dedicated Docker network and volumes

Networking and volumes keep services isolated and data persistent.
docker network create ai
docker volume create ollama
docker volume create open-webui

Step 4 — Start Ollama (CPU or GPU)

CPU-only:
docker run -d --name ollama --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest

GPU-enabled (NVIDIA):
docker run -d --name ollama --restart unless-stopped --network ai -p 11434:11434 --gpus all -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -v ollama:/root/.ollama ollama/ollama:latest

Step 5 — Download a model

Pull a model inside the Ollama container. Llama 3 8B works well on modern CPUs and mid-range GPUs; Mistral 7B is another good option.
docker exec -it ollama ollama pull llama3:8b
Alternative models: docker exec -it ollama ollama pull mistral:7b

Step 6 — Launch Open WebUI

Run the Open WebUI container and point it to the Ollama service via the Docker network.
docker run -d --name open-webui --restart unless-stopped --network ai -p 3000:8080 -e OLLAMA_BASE_URL=http://ollama:11434 -e OLLAMA_API_BASE_URL=http://ollama:11434 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:latest

Open a browser to http://SERVER_IP:3000. Create your admin account on first login. In Settings, select the downloaded model (for example, llama3:8b) and start chatting.

Security hardening

If you only need local access, bind ports to localhost by replacing -p 3000:8080 and -p 11434:11434 with -p 127.0.0.1:3000:8080 and -p 127.0.0.1:11434:11434. For remote access, put a reverse proxy like Nginx or Caddy in front with HTTPS and authentication. Also consider firewall rules to restrict inbound connections to required IPs.

Model and app updates

Update images and models periodically. Pull latest images and recreate containers:
docker pull ollama/ollama:latest && docker pull ghcr.io/open-webui/open-webui:latest
docker stop open-webui ollama && docker rm open-webui ollama
<re-run the docker run commands from above>
Update models as needed:
docker exec -it ollama ollama pull llama3:8b

Backup and restore

Backup persistent data via volumes. From a safe directory:
docker run --rm -v ollama:/data -v $PWD:/backup alpine tar czf /backup/ollama-models.tgz -C /data .
docker run --rm -v open-webui:/data -v $PWD:/backup alpine tar czf /backup/open-webui-data.tgz -C /data .
Restore by creating volumes and extracting archives back into them using similar commands.

Troubleshooting

GPU not detected in containers: Ensure nvidia-smi works on the host, the NVIDIA Container Toolkit is installed, and you used --gpus all. Restart Docker after configuring the toolkit. Verify with docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi.

Port conflicts: If ports 3000 or 11434 are in use, change the host ports (e.g., -p 3001:8080).

Slow performance: Prefer GPU models when available, close background apps, and choose smaller models like llama3:8b or mistral:7b. For CPU, disable power saving and use a recent CPU with AVX2.

Clean removal: Stop and remove everything with:
docker rm -f open-webui ollama
docker volume rm open-webui ollama
docker network rm ai

What you built

You now have a modern, private AI stack running locally: Ollama serving LLMs and Open WebUI delivering a polished chat interface. It is portable via Docker, secure when bound to localhost or proxied with TLS, and easy to update. This setup is ideal for prototyping prompts, experimenting with different models, and keeping sensitive data on your own machine.

Comments