How to Install Ollama and Open WebUI on Ubuntu 24.04 (with Optional GPU Acceleration)

Overview

This step-by-step guide shows how to run open-source large language models (LLMs) locally on Ubuntu 24.04 using Ollama for model serving and Open WebUI for a friendly chat interface. You will install Ollama, enable optional GPU acceleration (NVIDIA or CPU fallback), and deploy Open WebUI with Docker. The result is a private, fast, and controllable AI setup suitable for home labs and small teams.

Prerequisites

You need an Ubuntu 24.04 LTS host with internet access, a user with sudo rights, and at least 8 GB of RAM. A modern NVIDIA GPU is optional but recommended for faster inference. Make sure the system is up to date: sudo apt update && sudo apt -y upgrade

Step 1 — Install Ollama

Ollama is a lightweight server that downloads and runs models locally. Install it with the official script:
curl -fsSL https://ollama.com/install.sh | sh

Enable and start the service so it runs at boot:
sudo systemctl enable ollama sudo systemctl start ollama sudo systemctl status ollama

Verify the API is listening on port 11434:
curl http://127.0.0.1:11434/api/tags

Step 2 — Optional: Enable GPU Acceleration (NVIDIA)

If you have an NVIDIA GPU, install the recommended driver. Ubuntu makes this easy:
sudo ubuntu-drivers autoinstall sudo reboot

After reboot, confirm the driver is active:
nvidia-smi

Ollama detects GPUs automatically when drivers are present. No extra flags are required. If you need to force CPU or GPU behavior, you can set:
export OLLAMA_NO_GPU=1 (CPU only) or export OLLAMA_NO_GPU=0 (GPU allowed). For a persistent setting, add the variable to your shell profile and restart Ollama:
sudo systemctl restart ollama

AMD GPUs can work with ROCm on supported cards and drivers. If you are using AMD, install the ROCm runtime from AMD’s repository for Ubuntu 24.04, confirm with rocminfo, and ensure your user is in the video and render groups. If ROCm is not available for your hardware, Ollama will fall back to CPU.

Step 3 — Pull a Model and Test Locally

Pull a well-supported model. Llama 3 is a popular choice:
ollama pull llama3

Run a quick test:
ollama run llama3 "Write one sentence about Ubuntu 24.04."

Tip: For smaller footprints, choose tiny models like llama3:8b or phi3. VRAM needs vary; an 8B model typically benefits from 8–12 GB of GPU VRAM, while CPU-only runs need more system RAM and patience.

Step 4 — Install Docker and Open WebUI

Open WebUI gives you a clean browser interface for Ollama. Install Docker from Ubuntu repos for a quick start:
sudo apt install -y docker.io docker-compose-plugin

Allow your user to manage Docker without sudo, then re-login:
sudo usermod -aG docker $USER

Create a persistent volume for Open WebUI data and start the container. It will connect to Ollama on the host:
docker volume create openwebui docker run -d --name open-webui -p 3000:8080 --restart unless-stopped -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -v openwebui:/app/backend/data ghcr.io/open-webui/open-webui:latest

Open your browser to http://SERVER_IP:3000 and complete the initial admin setup. Add a model in Settings if it does not appear automatically, for example llama3.

Step 5 — Optional TLS with Caddy (Automatic HTTPS)

If you have a domain pointing to your server (A record), Caddy can auto-provision HTTPS certificates. Install it and configure a simple reverse proxy:
sudo apt install -y caddy

Edit /etc/caddy/Caddyfile (replace ai.example.com with your domain):
ai.example.com { reverse_proxy 127.0.0.1:3000 }

Reload Caddy:
sudo systemctl reload caddy. Visit https://ai.example.com. Ensure ports 80 and 443 are open on your firewall and router.

Step 6 — Backups and Updates

Ollama models are stored under ~/.ollama for non-root users or /usr/share/ollama when installed system-wide. Back up this directory to avoid re-downloading models. Example:
tar czf ollama-backup.tgz ~/.ollama

Open WebUI data is in the Docker volume openwebui. Back it up with:
docker run --rm -v openwebui:/data -v $(pwd):/backup alpine sh -c "cd /data && tar czf /backup/openwebui-backup.tgz ."

To update Ollama:
curl -fsSL https://ollama.com/install.sh | sh && sudo systemctl restart ollama. To update Open WebUI:
docker pull ghcr.io/open-webui/open-webui:latest && docker stop open-webui && docker rm open-webui && docker run ... (re-run the previous docker run command).

Troubleshooting

If port 11434 or 3000 is in use, change the port in the docker run command or stop the conflicting service. For slow responses, try a smaller model or ensure your GPU driver is working. If Open WebUI cannot reach Ollama, verify curl http://127.0.0.1:11434/api/tags succeeds on the host and confirm the OLLAMA_BASE_URL is correct.

Wrap-up

You now have a private AI stack on Ubuntu 24.04 with Ollama handling model inference and Open WebUI offering a clean chat interface. With optional GPU acceleration, HTTPS, and simple backups, this setup is fast, secure, and maintainable—perfect for learning, prototyping, or running an internal assistant.

LifeBytes Journal

Search This Blog