Deploy Ollama with GPU and Open WebUI on Ubuntu 24.04: Secure, Optimize, and Update

Overview

This guide shows how to deploy Ollama with GPU acceleration on Ubuntu 24.04 and connect it to Open WebUI for a friendly chat interface. You will set up drivers, install Ollama, run Open WebUI in Docker, lock down network access, and tune performance. The result is a secure, fast, and maintainable local AI stack suitable for developers, helpdesk teams, and privacy-focused environments.

Prerequisites

You need an Ubuntu 24.04 server (or desktop) with at least 16 GB RAM, a modern CPU, and optionally an NVIDIA GPU (Turing or newer recommended). You also need sudo access and an open outbound internet connection to fetch packages and models.

1) Update the system

sudo apt update && sudo apt -y upgrade && sudo reboot

2) Install NVIDIA drivers (GPU users)

If you have an NVIDIA GPU, install the proprietary driver and CUDA runtime to enable acceleration. Ubuntu’s repo provides a good default.

sudo ubuntu-drivers autoinstall

sudo reboot

Verify the driver is working:

nvidia-smi

If you see your GPU and driver version, you are ready. CPU-only users can skip this section; Ollama will fall back to CPU automatically.

3) Install Ollama

Ollama provides a simple installer that configures a systemd service and the command-line client.

curl -fsSL https://ollama.com/install.sh | sh

After installation, the service runs on localhost port 11434 by default. Confirm status:

systemctl status ollama

Pull a model to test the pipeline. Llama 3 is a popular general-purpose model:

ollama pull llama3

Run a quick prompt:

ollama run llama3 "Write one sentence about Ubuntu 24.04."

4) Harden Ollama’s listener

By default, Ollama listens on 127.0.0.1. Keep it that way to avoid exposing the API publicly. Confirm the binding:

ss -ltnp | grep 11434

If you ever need to force it, create a systemd override and set the environment variable.

sudo systemctl edit ollama

Add the following lines, then save:

[Service]
Environment=OLLAMA_HOST=127.0.0.1:11434

sudo systemctl daemon-reload && sudo systemctl restart ollama

5) Install Docker and run Open WebUI

Open WebUI is a lightweight, modern web front end that can connect to a local Ollama. Install Docker, then run Open WebUI as a container.

sudo apt -y install docker.io

sudo systemctl enable --now docker

Run Open WebUI and publish it on port 3000. The extra host mapping allows the container to talk to Ollama on the host’s 127.0.0.1.

sudo docker run -d --name openwebui --restart=unless-stopped -p 3000:8080 --add-host=host.docker.internal:host-gateway -e OLLAMA_BASE_URL=http://host.docker.internal:11434 ghcr.io/open-webui/open-webui:main

Open your browser and visit http://server-ip:3000. On first launch, create an admin user. In Settings, confirm the Ollama endpoint is http://host.docker.internal:11434.

6) Add HTTPS with Nginx (recommended)

Protect logins and prompts with TLS. Point a DNS record (e.g., ai.example.com) to your server, then install Nginx and Certbot.

sudo apt -y install nginx certbot python3-certbot-nginx

Create a basic reverse proxy for Open WebUI on port 3000:

sudo bash -c 'cat >/etc/nginx/sites-available/openwebui.conf<<EOF
server {
server_name ai.example.com;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
EOF'

sudo ln -s /etc/nginx/sites-available/openwebui.conf /etc/nginx/sites-enabled/

sudo nginx -t && sudo systemctl reload nginx

Issue a free certificate:

sudo certbot --nginx -d ai.example.com --redirect

7) Lock down the firewall

Use UFW to allow only SSH and HTTPS. Do not expose port 11434 (Ollama) or 3000 (Open WebUI) directly to the internet.

sudo apt -y install ufw

sudo ufw default deny incoming

sudo ufw allow OpenSSH

sudo ufw allow 443/tcp

sudo ufw enable

8) Performance tuning

If you have a GPU, enable the VRAM cache to improve response times and reduce recomputation. Create an environment file for Ollama.

sudo mkdir -p /etc/ollama

echo 'OLLAMA_KV_CACHE_SIZE=4GB' | sudo tee /etc/ollama/environment

Bind the environment file via systemd:

sudo systemctl edit ollama

Add:

[Service]
EnvironmentFile=/etc/ollama/environment

sudo systemctl daemon-reload && sudo systemctl restart ollama

Other useful variables: OLLAMA_NUM_PARALLEL to control concurrent requests, OLLAMA_FLASH_ATTENTION=1 on supported GPUs, and OMP_NUM_THREADS for CPU-bound workloads. For large models, ensure enough disk space under ~/.ollama/models.

9) Backups and persistence

Models live in ~/.ollama/models and Open WebUI stores its data in the container by default. To persist Open WebUI settings, mount a host directory:

sudo mkdir -p /opt/openwebui-data

sudo docker rm -f openwebui

sudo docker run -d --name openwebui --restart=unless-stopped -p 3000:8080 --add-host=host.docker.internal:host-gateway -e OLLAMA_BASE_URL=http://host.docker.internal:11434 -v /opt/openwebui-data:/app/backend/data ghcr.io/open-webui/open-webui:main

Back up ~/.ollama and /opt/openwebui-data with your usual backup tool (rsync, restic, Borg, or your enterprise solution). Exclude temporary caches if space is limited.

10) Updating and troubleshooting

Update Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Update Open WebUI:

sudo docker pull ghcr.io/open-webui/open-webui:main && sudo docker restart openwebui

If the GPU is not used, verify nvidia-smi shows utilization during prompts and check logs:

journalctl -u ollama -f

If Open WebUI cannot reach Ollama, confirm the extra host mapping is present and that curl http://host.docker.internal:11434 from inside the container returns JSON.

11) Optional: Zero-trust exposure

If you need remote access, consider exposing Open WebUI through a zero-trust tunnel (Tailscale Funnel, Cloudflare Tunnel, or Nginx with client certificates) instead of opening raw ports. This keeps the Ollama API private while giving you secure, audited access from anywhere.

Conclusion

You now have a production-ready local AI stack: Ollama for efficient model serving, Open WebUI for a clean interface, TLS for security, and sensible defaults to keep the API private. With regular updates and simple tuning, this setup is fast, safe, and easy to maintain on Ubuntu 24.04.

Comments