Install Ollama with Open WebUI on Ubuntu 24.04 (GPU-Accelerated Local AI Chat)

Overview

This step-by-step guide shows how to install Ollama and connect it to Open WebUI on Ubuntu 24.04. With this setup, you can run modern large language models like Llama 3 locally, use your NVIDIA GPU for acceleration, and chat through a clean web interface—no cloud required. The process includes installing system dependencies, enabling GPU support, running Open WebUI in Docker, pulling models, and basic troubleshooting. The language is simple, and every command is tested on Ubuntu 24.04.

Prerequisites

Before you start, make sure you have: (1) Ubuntu 24.04 with sudo access, (2) a modern NVIDIA GPU and driver support (optional but recommended), (3) at least 16 GB of RAM for medium models, and (4) stable internet access to download models and containers.

1) Update Ubuntu and install essentials

Begin by updating your packages and installing the tools we will use. If prompted, confirm with Y:

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl ca-certificates gnupg ufw git

2) Install NVIDIA drivers (for GPU acceleration)

Ollama uses your GPU automatically when the correct NVIDIA driver is present. If you do not have a GPU, you can still run models on the CPU (slower). To enable GPU acceleration on NVIDIA cards:

sudo ubuntu-drivers autoinstall
sudo reboot

After reboot, verify the driver:

nvidia-smi

You should see your GPU listed. If you prefer manual control, install a specific driver from “Additional Drivers” in Ubuntu.

3) Install Ollama

Ollama is a lightweight server that manages models locally and exposes an HTTP API on port 11434. Install it with the official script:

curl -fsSL https://ollama.com/install.sh | sh

Enable and verify the system service:

sudo systemctl enable ollama
sudo systemctl start ollama
systemctl status ollama

If you see it active and running, Ollama is ready at http://localhost:11434.

4) Pull a model and test locally

Pull a modern, efficient model. Llama 3.1 8B is a good starting point (adjust model to your hardware):

ollama pull llama3.1:8b

Run a quick chat in the terminal to verify GPU usage:

ollama run llama3.1:8b

If your GPU is recognized, the first generation will warm up, and subsequent responses should be fast. You can also try other models like mistral, phi-3, or neural-chat.

5) Install Docker and run Open WebUI

Open WebUI provides a clean browser interface for chatting with local models. Install Docker from Ubuntu’s repo for simplicity:

sudo apt install -y docker.io docker-compose-plugin
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
# Log out and back in to apply docker group membership (or run a new shell).

Start Open WebUI and point it to the host Ollama API:

docker run -d --name open-webui \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_API_BASE_URL=http://host.docker.internal:11434 \
  ghcr.io/open-webui/open-webui:latest

Open a browser and go to http://YOUR_SERVER_IP:3000 to access Open WebUI. On first run, create an admin user. In Settings > Connections, confirm the Ollama endpoint is http://host.docker.internal:11434.

6) Secure basic network access

If UFW is enabled, allow the Open WebUI port:

sudo ufw allow 3000/tcp
sudo ufw status

For internet exposure, place Open WebUI behind a reverse proxy (Nginx/Caddy) with HTTPS. If you only use it on your LAN, keep it on port 3000 and block external access at your router or firewall.

7) Daily use tips

- To list models: ollama list. To remove one: ollama rm MODEL.
- To update Ollama when a new version is released: rerun the install script, then sudo systemctl restart ollama.
- For faster chat, choose 7B–8B models or quantized variants (like Q4_K_M). Larger models need more VRAM and RAM.

Troubleshooting

No compatible GPU found: Check nvidia-smi. If it fails, reinstall drivers with ubuntu-drivers autoinstall and reboot. Ensure Secure Boot is disabled or properly configured for NVIDIA modules.

Open WebUI cannot reach Ollama: Confirm the container can resolve the host gateway. We used --add-host=host.docker.internal:host-gateway. Also verify the env OLLAMA_API_BASE_URL and that the Ollama service is active: systemctl status ollama.

Slow generations on CPU: Use smaller models (e.g., 3–8B) or quantized versions. GPU acceleration is the biggest speed boost; ensure drivers are correct.

Ports already in use: If 3000 or 11434 is used, change the exposed port for Open WebUI (-p 4000:8080 for example) and update firewall rules.

Check logs: Ollama logs: journalctl -u ollama -f. Open WebUI logs: docker logs -f open-webui.

Optional: Reverse proxy with Nginx (HTTPS)

For public access with TLS, install Nginx and Certbot, then map a domain to your server and issue a Let’s Encrypt certificate. Point Nginx to the Open WebUI container on 3000. Keep strong passwords and consider IP allowlists or SSO for security.

What you get

You now have a private, GPU-accelerated local AI stack: Ollama runs models efficiently on your Ubuntu host, and Open WebUI gives you a modern chat interface. This setup is ideal for development, research, and privacy-focused workflows without sending your data to external clouds.

Comments