Overview
This step-by-step guide shows how to run open-source large language models (LLMs) locally on Ubuntu 24.04 using Ollama for model serving and Open WebUI for a friendly chat interface. You will install Ollama, enable optional GPU acceleration (NVIDIA or CPU fallback), and deploy Open WebUI with Docker. The result is a private, fast, and controllable AI setup suitable for home labs and small teams.
Prerequisites
You need an Ubuntu 24.04 LTS host with internet access, a user with sudo rights, and at least 8 GB of RAM. A modern NVIDIA GPU is optional but recommended for faster inference. Make sure the system is up to date: sudo apt update && sudo apt -y upgrade
Step 1 — Install Ollama
Ollama is a lightweight server that downloads and runs models locally. Install it with the official script: curl -fsSL https://ollama.com/install.sh | sh
Enable and start the service so it runs at boot: sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama
Verify the API is listening on port 11434: curl http://127.0.0.1:11434/api/tags
Step 2 — Optional: Enable GPU Acceleration (NVIDIA)
If you have an NVIDIA GPU, install the recommended driver. Ubuntu makes this easy: sudo ubuntu-drivers autoinstall
sudo reboot
After reboot, confirm the driver is active: nvidia-smi
Ollama detects GPUs automatically when drivers are present. No extra flags are required. If you need to force CPU or GPU behavior, you can set: export OLLAMA_NO_GPU=1 (CPU only) or export OLLAMA_NO_GPU=0 (GPU allowed). For a persistent setting, add the variable to your shell profile and restart Ollama: sudo systemctl restart ollama
AMD GPUs can work with ROCm on supported cards and drivers. If you are using AMD, install the ROCm runtime from AMD’s repository for Ubuntu 24.04, confirm with rocminfo, and ensure your user is in the video and render groups. If ROCm is not available for your hardware, Ollama will fall back to CPU.
Step 3 — Pull a Model and Test Locally
Pull a well-supported model. Llama 3 is a popular choice: ollama pull llama3
Run a quick test: ollama run llama3 "Write one sentence about Ubuntu 24.04."
Tip: For smaller footprints, choose tiny models like llama3:8b or phi3. VRAM needs vary; an 8B model typically benefits from 8–12 GB of GPU VRAM, while CPU-only runs need more system RAM and patience.
Step 4 — Install Docker and Open WebUI
Open WebUI gives you a clean browser interface for Ollama. Install Docker from Ubuntu repos for a quick start: sudo apt install -y docker.io docker-compose-plugin
Allow your user to manage Docker without sudo, then re-login: sudo usermod -aG docker $USER
Create a persistent volume for Open WebUI data and start the container. It will connect to Ollama on the host: docker volume create openwebui
docker run -d --name open-webui -p 3000:8080 --restart unless-stopped -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -v openwebui:/app/backend/data ghcr.io/open-webui/open-webui:latest
Open your browser to http://SERVER_IP:3000 and complete the initial admin setup. Add a model in Settings if it does not appear automatically, for example llama3.
Step 5 — Optional TLS with Caddy (Automatic HTTPS)
If you have a domain pointing to your server (A record), Caddy can auto-provision HTTPS certificates. Install it and configure a simple reverse proxy: sudo apt install -y caddy
Edit /etc/caddy/Caddyfile (replace ai.example.com with your domain): ai.example.com {
reverse_proxy 127.0.0.1:3000
}
Reload Caddy: sudo systemctl reload caddy. Visit https://ai.example.com. Ensure ports 80 and 443 are open on your firewall and router.
Step 6 — Backups and Updates
Ollama models are stored under ~/.ollama for non-root users or /usr/share/ollama when installed system-wide. Back up this directory to avoid re-downloading models. Example: tar czf ollama-backup.tgz ~/.ollama
Open WebUI data is in the Docker volume openwebui. Back it up with: docker run --rm -v openwebui:/data -v $(pwd):/backup alpine sh -c "cd /data && tar czf /backup/openwebui-backup.tgz ."
To update Ollama: curl -fsSL https://ollama.com/install.sh | sh && sudo systemctl restart ollama. To update Open WebUI: docker pull ghcr.io/open-webui/open-webui:latest && docker stop open-webui && docker rm open-webui && docker run ... (re-run the previous docker run command).
Troubleshooting
If port 11434 or 3000 is in use, change the port in the docker run command or stop the conflicting service. For slow responses, try a smaller model or ensure your GPU driver is working. If Open WebUI cannot reach Ollama, verify curl http://127.0.0.1:11434/api/tags succeeds on the host and confirm the OLLAMA_BASE_URL is correct.
Wrap-up
You now have a private AI stack on Ubuntu 24.04 with Ollama handling model inference and Open WebUI offering a clean chat interface. With optional GPU acceleration, HTTPS, and simple backups, this setup is fast, secure, and maintainable—perfect for learning, prototyping, or running an internal assistant.
Comments
Post a Comment