Running a fast, private AI chatbot on your own computer or server is easier than ever. In this guide, you will install Ollama (a lightweight local LLM runtime) and Open WebUI (a modern web interface) on Ubuntu 24.04. You will be able to chat with models like Llama 3 or Mistral without sending data to the cloud, and with optional GPU acceleration if you have an NVIDIA card.
What you will need: an Ubuntu 22.04/24.04 machine (VM, bare metal, or WSL), at least 8 GB RAM (16 GB recommended for larger models), 15–30 GB free disk space for models, Internet access, and optional NVIDIA GPU drivers for acceleration.
Why Ollama + Open WebUI?
Ollama manages local large language models (LLMs) with simple commands and sensible defaults. Open WebUI gives you a clean, chat-style interface with features like prompt history, file uploads (for some models), and model switching. Together, they are a simple, reliable stack for a self-hosted AI experience.
1) Update Ubuntu and install basics
First, refresh your package list and install required tools:
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl ca-certificates
2) Install Ollama and start the service
Ollama provides an installer script for Linux. Run the following to install and start the service:
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
Verify that the Ollama API is listening on port 11434:
ss -tulpn | grep 11434
3) Pull a model (Llama 3 as an example)
Ollama hosts a registry of optimized models. Pull a popular general-purpose model such as Llama 3 8B:
ollama pull llama3:8b
After the download completes, you can test it quickly:
ollama run llama3:8b
Type a prompt and press Enter. Press Ctrl+C to exit.
4) Install Docker and run Open WebUI
Open WebUI is easiest to deploy with Docker. Install Docker using the official convenience script, then start the container:
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER # log out/in after this
Run Open WebUI and connect it to your local Ollama service:
docker run -d --name open-webui \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
-v open-webui:/app/backend/data \
--restart unless-stopped \
ghcr.io/open-webui/open-webui:latest
Open your browser and go to http://SERVER_IP:3000. On first access you will create an admin account. Then choose your default model (e.g., llama3:8b) from the interface and start chatting.
5) Enable GPU acceleration (optional, NVIDIA)
If your machine has an NVIDIA GPU, install the official driver from Ubuntu’s Additional Drivers or with sudo apt install nvidia-driver-XXX (replace XXX with a recommended version). Reboot and verify with nvidia-smi. Ollama will auto-detect CUDA and use your GPU for supported models, delivering much faster responses. You do not need GPU pass-through to Docker for this setup because Ollama runs on the host.
6) Secure and harden your deployment
Local-only binding: If you are on a public server, avoid exposing the UI directly. Bind Open WebUI to localhost and place a reverse proxy with HTTPS in front:
docker rm -f open-webui
docker run -d --name open-webui \
-p 127.0.0.1:3000:8080 \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
-v open-webui:/app/backend/data \
--restart unless-stopped \
ghcr.io/open-webui/open-webui:latest
Reverse proxy tip: Use any TLS-capable proxy (Nginx, Caddy, Traefik). For example, with Caddy you can map your domain to localhost:3000 and get automatic HTTPS. Protect access using password auth or your proxy’s single sign-on.
7) Daily use and model management
Switch models in the Open WebUI sidebar or pull additional ones via Ollama. Useful commands:
# list local models
ollama list
# pull a different model
ollama pull mistral:7b
# remove unused models to free space
ollama rm model_name
When you click “New Chat” in Open WebUI, you can choose the model and adjust temperature, system prompt, and other parameters. For tasks like coding or reasoning, try llama3.1 or mistral-nemo variants if available for your hardware.
8) Updating the stack
Keep components fresh to get speed and quality improvements:
# update Ollama binary
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl restart ollama
# update Open WebUI container
docker pull ghcr.io/open-webui/open-webui:latest
docker stop open-webui && docker rm open-webui
docker run ... (same command as before)
9) Troubleshooting common issues
Port conflicts: If 11434 or 3000 is already in use, pick a different host port (for example, -p 8081:8080 for Open WebUI). Check usage with ss -tulpn.
Insufficient VRAM or RAM: Large models may fail to load. Try a smaller variant (e.g., llama3:8b instead of 70b), or use quantized builds where available.
No GPU detected: Ensure the NVIDIA driver is installed and loaded (nvidia-smi works). Reboot after driver installation. Ollama falls back to CPU if no GPU is available.
Docker permissions: If you see “permission denied,” log out and back in after adding your user to the docker group, or run commands with sudo.
Disk space: Models can be large. Use ollama list and ollama rm to remove what you do not need. Check usage with df -h.
10) What’s next?
Explore prompt templates, create system prompts for repeatable tasks, and try specialized models for coding, document Q&A, or SQL. You can also connect Open WebUI to external tools, set up team access behind your company SSO, or run multiple instances for different workloads. With Ollama and Open WebUI, you have a fast, private, and extensible foundation for local generative AI on Ubuntu.
Comments
Post a Comment