How to Self-Host a Local AI Chatbot: Ollama + Open WebUI on Ubuntu 24.04 with NVIDIA GPU

Overview

This tutorial shows how to self-host a fast, private AI chatbot using Ollama and Open WebUI on Ubuntu 24.04 with an NVIDIA GPU. Ollama runs large language models locally and uses your GPU for acceleration; Open WebUI provides a sleek, browser-based interface. By the end, you will have a secure, production-ready setup using Docker for the UI and a systemd service for Ollama.

Prerequisites

- Ubuntu 24.04 LTS (server or desktop) with sudo access.

- An NVIDIA GPU (Turing or newer recommended) and a reliable internet connection.

- Optional: A domain name if you plan to expose the UI over HTTPS.

Step 1: Install NVIDIA Driver and Verify CUDA

Install the recommended NVIDIA driver:

sudo apt update && sudo ubuntu-drivers install

Reboot and verify:

sudo reboot

nvidia-smi

You should see your GPU details and driver version. If Secure Boot is enabled and blocks the driver, either enroll a MOK when prompted or temporarily disable Secure Boot in firmware settings.

Step 2: Install Ollama and Enable the Service

Ollama provides a one-line installer that sets up the binary and a systemd service listening on localhost:11434.

curl -fsSL https://ollama.com/install.sh | sh

Check the service:

systemctl status ollama

If needed, start and enable it:

sudo systemctl enable --now ollama

Step 3: Pull a Model and Test GPU Acceleration

Pull a modern, efficient model like Llama 3 (choose the size that fits your VRAM; 8B works on many GPUs):

ollama pull llama3

Run a quick test and watch GPU utilization in another terminal with nvidia-smi:

ollama run llama3 "Write a two-line poem about Ubuntu and GPUs."

If the model runs and nvidia-smi shows activity, GPU acceleration is working. If not, confirm the driver is loaded and retry.

Step 4: Install Docker and Compose Plugin

Open WebUI will run in Docker to simplify updates and isolation.

sudo apt update && sudo apt install -y docker.io docker-compose-plugin

Add your user to the Docker group to avoid using sudo:

sudo usermod -aG docker $USER && newgrp docker

Step 5: Deploy Open WebUI with Docker Compose

Create a project folder and a compose file that points Open WebUI to the host’s Ollama endpoint.

mkdir -p ~/open-webui && cd ~/open-webui

cat > compose.yaml << 'YAML'
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- open-webui:/app/backend/data
restart: unless-stopped
volumes:
open-webui: {}
YAML

Start the container:

docker compose up -d

Open the firewall (if UFW is enabled):

sudo ufw allow 3000/tcp

Step 6: First-Time Access and Configuration

Visit http://<server-ip>:3000, create your admin account, and confirm the Ollama connection URL is set to http://host.docker.internal:11434. You can now select or download models from the UI or use the ones you already pulled via the CLI. Start chatting to verify responses are fast and local.

Optional: Security and HTTPS

If you plan to expose Open WebUI to the internet, do not publish port 3000 directly. Instead, set the container to bind locally and put a reverse proxy in front with TLS.

- Bind UI to localhost only: edit the ports line to "127.0.0.1:3000:8080".

- Use a reverse proxy like Caddy or Nginx, obtain a TLS certificate (e.g., Let’s Encrypt), and enable basic auth or OAuth. Keep your system updated and restrict access with a firewall or VPN.

Troubleshooting

No GPU usage: Run nvidia-smi. If it shows “No devices were found,” reinstall the driver with ubuntu-drivers install, ensure Secure Boot is handled, and reboot. Confirm you are not in a VM without GPU passthrough.

Ollama not responding: Check systemctl status ollama and logs with journalctl -u ollama -e. Ensure it listens on localhost:11434 and no other service conflicts.

Open WebUI cannot reach Ollama: Confirm the extra_hosts entry for host-gateway is present. Try setting OLLAMA_BASE_URL=http://<host-ip>:11434 instead of host.docker.internal. Restart with docker compose up -d.

Docker permission denied: Re-run sudo usermod -aG docker $USER, then newgrp docker or log out/in.

Out-of-memory on big models: Choose a smaller model (e.g., llama3:8b), reduce context, or set OLLAMA_NUM_GPU=1 to limit sharding. Monitor with nvidia-smi.

Updates, Backups, and Removal

Update Ollama: Re-run the installer to fetch the latest version: curl -fsSL https://ollama.com/install.sh | sh. Update models with ollama pull <model:tag>.

Update Open WebUI: cd ~/open-webui && docker compose pull && docker compose up -d.

Back up UI data: The named volume open-webui stores settings and chats. Snapshot it with docker run --rm -v open-webui:/data -v $(pwd):/backup busybox tar czf /backup/open-webui-backup.tgz -C / data.

Uninstall (optional): Stop UI with docker compose down. Remove volume with docker volume rm open-webui. Disable Ollama with sudo systemctl disable --now ollama. Remove the Ollama binary and files only if you no longer need them.

Conclusion

You now have a private, GPU-accelerated AI chatbot running locally with Ollama and Open WebUI on Ubuntu 24.04. This setup is fast, secure, and easy to maintain. You can switch models on demand, keep everything offline, and scale performance by upgrading your GPU or choosing optimized models.

Comments