Overview
This guide shows you how to run a private, local AI chatbot on Ubuntu using Ollama (for running large language models on CPU or GPU) and Open WebUI (a clean web interface). You will be able to chat with models like Llama 3 locally, without sending data to the cloud. The steps work on Ubuntu 22.04 and 24.04, and are suitable for both desktops and servers.
What You Will Set Up
You will install Ollama on the host, pull a model, and run Open WebUI in Docker. Open WebUI will connect to the Ollama API on port 11434. The result is a self-hosted, secure, and fast AI assistant accessible at http://YOUR_SERVER_IP:3000.
Prerequisites
- Ubuntu 22.04 or 24.04 with sudo access
- At least 8 GB RAM (16 GB+ recommended for 7B–8B models; use smaller quantized models on low-RAM systems)
- Optional: NVIDIA or AMD GPU for faster inference
- Internet access to download packages and models
Step 1: Install Ollama
Ollama is a lightweight runtime that serves models locally over an HTTP API (default: 11434). Install it with one command:
curl -fsSL https://ollama.com/install.sh | sh
Verify the installation and service:
ollama --version
systemctl status ollama
If the service is inactive, start it:
sudo systemctl enable --now ollama
Step 2: Pull a Model (Llama 3.1 as an example)
Download a good general-purpose model. Llama 3.1 8B is a solid balance for many machines:
ollama pull llama3.1:8b
Test it in the terminal:
ollama run llama3.1:8b
>>> Write a 1-sentence productivity tip.
Tip: If you run out of memory, pull a quantized variant, for example:
ollama pull llama3.1:8b-instruct-q4_K_M
Step 3: Install Docker Engine
Open WebUI ships a reliable container image. Install Docker from the official repository:
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
Log out and in again (or run a new shell) to use Docker without sudo.
Step 4: Run Open WebUI and Connect to Ollama
Start Open WebUI in Docker, mapping port 3000 and pointing it to the Ollama API on the host. The host gateway alias works on modern Docker versions:
docker run -d --name open-webui \
--restart unless-stopped \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:latest
Open your browser at http://YOUR_SERVER_IP:3000, create an admin account on first launch, and select the default model (e.g., llama3.1:8b). You can now chat and manage prompts, files, and settings.
If the container cannot reach the host, an alternative is host networking:
docker run -d --name open-webui \
--restart unless-stopped \
--network=host \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:latest
Step 5: Secure Access
- By default, Open WebUI uses account-based sign-in. In Settings > Admin panel, require authentication for all users.
- If you run on a public server, restrict binding to localhost and use an SSH tunnel:
# bind only to localhost:
docker run -d --name open-webui \
--restart unless-stopped \
-p 127.0.0.1:3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:latest
# from your laptop:
ssh -L 3000:localhost:3000 [email protected]
Step 6: Use Your GPU (Optional)
Ollama will use your GPU if supported drivers are installed. For NVIDIA, install the proprietary driver and CUDA libraries (the standard Ubuntu “Additional Drivers” tool works). For AMD, install ROCm per Ubuntu/AMD documentation. Because Open WebUI talks to Ollama’s API, you do not need GPU support inside the Open WebUI container—only in Ollama on the host.
Step 7: Model Management Tips
- List models: ollama list
- Show model metadata: ollama show llama3.1:8b
- Remove a model: ollama rm MODEL_NAME
- Try alternatives: ollama pull mistral, ollama pull qwen2, or ollama pull phi3:mini for lower memory systems.
Troubleshooting
- Port in use: Change the mapped port, for example -p 4000:8080 and open http://YOUR_SERVER_IP:4000.
- Container cannot reach Ollama: Use --network=host or ensure --add-host=host.docker.internal:host-gateway is present.
- Out-of-memory: Pull a smaller/quantized model (e.g., q4_K_M variants) or close other apps.
- Slow responses: Prefer GPU, reduce context length in Open WebUI settings, or choose a smaller model.
Updating and Maintenance
- Update Ollama: rerun the install script, then restart the service: curl -fsSL https://ollama.com/install.sh | sh && sudo systemctl restart ollama.
- Update Open WebUI:
docker pull ghcr.io/open-webui/open-webui:latest
docker stop open-webui && docker rm open-webui
# run again with the same docker run command used earlier
Conclusion
With Ollama and Open WebUI, you can host a private, fast, and flexible chatbot on your own Ubuntu machine. This setup keeps your data local, supports multiple open models, and can leverage your GPU for speed. Whether you are a helpdesk, a developer, or a power user, this self-hosted stack gives you full control over your AI workflow.
Comments
Post a Comment