Overview
This guide shows how to deploy Ollama with GPU acceleration on Ubuntu 24.04 and connect it to Open WebUI for a friendly chat interface. You will set up drivers, install Ollama, run Open WebUI in Docker, lock down network access, and tune performance. The result is a secure, fast, and maintainable local AI stack suitable for developers, helpdesk teams, and privacy-focused environments.
Prerequisites
You need an Ubuntu 24.04 server (or desktop) with at least 16 GB RAM, a modern CPU, and optionally an NVIDIA GPU (Turing or newer recommended). You also need sudo access and an open outbound internet connection to fetch packages and models.
1) Update the system
sudo apt update && sudo apt -y upgrade && sudo reboot
2) Install NVIDIA drivers (GPU users)
If you have an NVIDIA GPU, install the proprietary driver and CUDA runtime to enable acceleration. Ubuntu’s repo provides a good default.
sudo ubuntu-drivers autoinstall
sudo reboot
Verify the driver is working:
nvidia-smi
If you see your GPU and driver version, you are ready. CPU-only users can skip this section; Ollama will fall back to CPU automatically.
3) Install Ollama
Ollama provides a simple installer that configures a systemd service and the command-line client.
curl -fsSL https://ollama.com/install.sh | sh
After installation, the service runs on localhost port 11434 by default. Confirm status:
systemctl status ollama
Pull a model to test the pipeline. Llama 3 is a popular general-purpose model:
ollama pull llama3
Run a quick prompt:
ollama run llama3 "Write one sentence about Ubuntu 24.04."
4) Harden Ollama’s listener
By default, Ollama listens on 127.0.0.1. Keep it that way to avoid exposing the API publicly. Confirm the binding:
ss -ltnp | grep 11434
If you ever need to force it, create a systemd override and set the environment variable.
sudo systemctl edit ollama
Add the following lines, then save:
[Service]
Environment=OLLAMA_HOST=127.0.0.1:11434
sudo systemctl daemon-reload && sudo systemctl restart ollama
5) Install Docker and run Open WebUI
Open WebUI is a lightweight, modern web front end that can connect to a local Ollama. Install Docker, then run Open WebUI as a container.
sudo apt -y install docker.io
sudo systemctl enable --now docker
Run Open WebUI and publish it on port 3000. The extra host mapping allows the container to talk to Ollama on the host’s 127.0.0.1.
sudo docker run -d --name openwebui --restart=unless-stopped -p 3000:8080 --add-host=host.docker.internal:host-gateway -e OLLAMA_BASE_URL=http://host.docker.internal:11434 ghcr.io/open-webui/open-webui:main
Open your browser and visit http://server-ip:3000. On first launch, create an admin user. In Settings, confirm the Ollama endpoint is http://host.docker.internal:11434.
6) Add HTTPS with Nginx (recommended)
Protect logins and prompts with TLS. Point a DNS record (e.g., ai.example.com) to your server, then install Nginx and Certbot.
sudo apt -y install nginx certbot python3-certbot-nginx
Create a basic reverse proxy for Open WebUI on port 3000:
sudo bash -c 'cat >/etc/nginx/sites-available/openwebui.conf<<EOF
server {
server_name ai.example.com;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
EOF'
sudo ln -s /etc/nginx/sites-available/openwebui.conf /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
Issue a free certificate:
sudo certbot --nginx -d ai.example.com --redirect
7) Lock down the firewall
Use UFW to allow only SSH and HTTPS. Do not expose port 11434 (Ollama) or 3000 (Open WebUI) directly to the internet.
sudo apt -y install ufw
sudo ufw default deny incoming
sudo ufw allow OpenSSH
sudo ufw allow 443/tcp
sudo ufw enable
8) Performance tuning
If you have a GPU, enable the VRAM cache to improve response times and reduce recomputation. Create an environment file for Ollama.
sudo mkdir -p /etc/ollama
echo 'OLLAMA_KV_CACHE_SIZE=4GB' | sudo tee /etc/ollama/environment
Bind the environment file via systemd:
sudo systemctl edit ollama
Add:
[Service]
EnvironmentFile=/etc/ollama/environment
sudo systemctl daemon-reload && sudo systemctl restart ollama
Other useful variables: OLLAMA_NUM_PARALLEL to control concurrent requests, OLLAMA_FLASH_ATTENTION=1 on supported GPUs, and OMP_NUM_THREADS for CPU-bound workloads. For large models, ensure enough disk space under ~/.ollama/models.
9) Backups and persistence
Models live in ~/.ollama/models and Open WebUI stores its data in the container by default. To persist Open WebUI settings, mount a host directory:
sudo mkdir -p /opt/openwebui-data
sudo docker rm -f openwebui
sudo docker run -d --name openwebui --restart=unless-stopped -p 3000:8080 --add-host=host.docker.internal:host-gateway -e OLLAMA_BASE_URL=http://host.docker.internal:11434 -v /opt/openwebui-data:/app/backend/data ghcr.io/open-webui/open-webui:main
Back up ~/.ollama and /opt/openwebui-data with your usual backup tool (rsync, restic, Borg, or your enterprise solution). Exclude temporary caches if space is limited.
10) Updating and troubleshooting
Update Ollama:
curl -fsSL https://ollama.com/install.sh | sh
Update Open WebUI:
sudo docker pull ghcr.io/open-webui/open-webui:main && sudo docker restart openwebui
If the GPU is not used, verify nvidia-smi shows utilization during prompts and check logs:
journalctl -u ollama -f
If Open WebUI cannot reach Ollama, confirm the extra host mapping is present and that curl http://host.docker.internal:11434 from inside the container returns JSON.
11) Optional: Zero-trust exposure
If you need remote access, consider exposing Open WebUI through a zero-trust tunnel (Tailscale Funnel, Cloudflare Tunnel, or Nginx with client certificates) instead of opening raw ports. This keeps the Ollama API private while giving you secure, audited access from anywhere.
Conclusion
You now have a production-ready local AI stack: Ollama for efficient model serving, Open WebUI for a clean interface, TLS for security, and sensible defaults to keep the API private. With regular updates and simple tuning, this setup is fast, safe, and easy to maintain on Ubuntu 24.04.
Comments
Post a Comment