Run a Local AI Chatbot on Ubuntu with Ollama and Open WebUI (GPU Ready)

This step-by-step guide shows you how to run a fast, private, and local AI chatbot on Ubuntu 22.04 or 24.04 using Ollama and Open WebUI. You will install the Ollama runtime, pull a modern large language model, and add a clean chat interface via Open WebUI in Docker. Optional steps cover NVIDIA GPU acceleration, API usage, and persistence. The result is a secure, offline-friendly setup suitable for helpdesk, coding assistance, or knowledge base querying without sending data to the cloud.

Why Ollama + Open WebUI

Ollama makes it simple to run and manage open-source LLMs locally (Llama 3.x, Mistral, Phi, Qwen, and more). Open WebUI adds a user-friendly, browser-based chat interface with conversation history, prompt templates, and multi-model support. Together they form a robust, low-maintenance local AI stack for Linux desktops and servers.

Prerequisites

- Ubuntu 22.04 or 24.04 with a non-root sudo user.
- At least 8 GB RAM (16 GB recommended for larger models).
- Optional NVIDIA GPU for acceleration (T4/RTX/RTX A-series, etc.).
- Internet access to download models and containers.

Step 1 — Install Ollama

1) Update packages:
sudo apt update && sudo apt install -y curl ca-certificates
2) Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
3) Enable as a service:
sudo systemctl enable --now ollama
4) Verify the API is up:
curl http://localhost:11434/api/tags
If you see JSON, Ollama is running correctly.

Step 2 — Pull and test a model

Pull a compact, capable model first to validate your setup:
ollama pull llama3.2
Run an interactive test:
ollama run llama3.2
Type a prompt, then press Ctrl+C to exit. You can later try larger models (for example, ollama pull mistral or ollama pull llama3.1), but start small to confirm everything works.

Step 3 — Install Docker Engine

1) Add Docker’s repo key and source:
sudo apt-get update && sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo $UBUNTU_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
2) Install Docker and the Compose plugin:
sudo apt-get update && sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
3) Add your user to the docker group and refresh your shell:
sudo usermod -aG docker $USER
newgrp docker

Step 4 — Run Open WebUI connected to Ollama

Start Open WebUI and point it to the Ollama API on the host. The --add-host flag maps host.docker.internal to your host’s gateway so the container can reach http://localhost:11434 on the host:

docker run -d --name open-webui --restart=unless-stopped -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main

Open your browser to http://SERVER_IP:3000 (or http://localhost:3000). Create the first admin account, choose a model (for example, llama3.2), and start chatting.

Step 5 — Enable NVIDIA GPU acceleration (optional)

1) Install the latest NVIDIA driver for your GPU using Ubuntu’s Additional Drivers or apt. Reboot if prompted.
2) Install the NVIDIA Container Toolkit so Docker can access the GPU:
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
3) Recreate Open WebUI with GPU access:
docker rm -f open-webui
docker run -d --name open-webui --restart=unless-stopped --gpus all -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main

4) Ollama will also use the GPU automatically when a compatible model is loaded. You can confirm GPU use with nvidia-smi during inference.

Step 6 — Use the Ollama HTTP API

You can script local inference via HTTP without the UI. Example generation request:
curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Write a haiku about backups."}'
Chat format with memory:
curl http://localhost:11434/api/chat -d '{"model":"llama3.2","messages":[{"role":"user","content":"Explain DNS in one sentence."}]}'

Step 7 — Persistence, autostart, and updates

- Ollama models are stored under ~/.ollama/models. Back up this directory to avoid re-downloading models.
- The Open WebUI container uses a named volume (open-webui) for its data, which persists across restarts.
- Ollama is already set to start at boot (systemctl enable ollama). The WebUI container uses --restart=unless-stopped so it will auto-start after a reboot.
- Update Ollama: curl -fsSL https://ollama.com/install.sh | sh
- Update Open WebUI: docker pull ghcr.io/open-webui/open-webui:main && docker restart open-webui

Troubleshooting

- Open WebUI cannot connect to Ollama: ensure you used --add-host=host.docker.internal:host-gateway and that curl http://localhost:11434/api/tags works on the host.
- Port already in use: change -p 3000:8080 to a different host port like -p 3333:8080.
- Out of memory or slow responses: try a smaller model (for example, llama3.2 or phi3). Close other apps or add swap. For CPU-only hosts, expect slower performance on large models.
- GPU not used: verify drivers, nvidia-smi, and that the container runs with --gpus all. Pull a GPU-optimized model variant if available.

What you can do next

- Connect knowledge bases or documents using Open WebUI’s RAG features to power local search over PDFs and wikis.
- Add multiple models and switch per chat, benchmarking speed and quality.
- Put Nginx or Caddy in front of :3000 for HTTPS and trusted network access.
- Automate prompts with shell scripts or Python by calling the local Ollama API.

You now have a private, local AI assistant on Ubuntu with a clean web interface, GPU-ready acceleration, and a stable upgrade path—all without sending your data to third-party services.

Comments