How to Self‑Host Open WebUI and Ollama on Ubuntu with Docker, HTTPS, and NVIDIA GPU Support

Overview

This guide shows how to self-host a private AI chatbot with Open WebUI (a clean, ChatGPT-like interface) and Ollama (for running local large language models) on Ubuntu 22.04 or 24.04. Everything runs in Docker, secured with HTTPS via Caddy and optional Basic Auth. If you have an NVIDIA GPU, you can enable GPU acceleration to speed up model inference dramatically.

What you will need

- An Ubuntu 22.04/24.04 server with at least 8 GB RAM and 20 GB free disk space. For GPU acceleration, an NVIDIA GPU with recent drivers is recommended (e.g., 8 GB VRAM or more for larger models).

- A domain name pointing to your server’s public IP (A/AAAA record). Ports 80 and 443 should be open to the internet for Let’s Encrypt.

- A non-root user with sudo privileges.

Step 1 — Install Docker and Docker Compose plugin

Update your system and install Docker from the official repository:

sudo apt update && sudo apt install -y ca-certificates curl gnupg

sudo install -m 0755 -d /etc/apt/keyrings

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

sudo usermod -aG docker $USER && newgrp docker

Step 2 — (Optional) Enable NVIDIA GPU for containers

Install the NVIDIA driver (if not already installed) and the NVIDIA container toolkit so Docker can access your GPU.

sudo ubuntu-drivers install (or choose a specific driver, e.g., sudo apt install -y nvidia-driver-535)

sudo reboot

Install the container toolkit:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update && sudo apt install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker

sudo systemctl restart docker

Test GPU access:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Step 3 — Prepare Docker Compose and Caddy

Create a project folder and move into it:

mkdir -p ~/ai-stack && cd ~/ai-stack

Create a file named docker-compose.yml with the following content (replace your.domain.com later in Caddyfile):

services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=2h
ports:
- "127.0.0.1:11434:11434"
restart: unless-stopped
# Uncomment the next line if you enabled NVIDIA toolkit
# gpus: all

openwebui:
image: ghcr.io/open-webui/open-webui:main
container_name: openwebui
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=True
ports:
- "127.0.0.1:8080:8080"
volumes:
- openwebui:/app/backend/data
restart: unless-stopped

caddy:
image: caddy:2
container_name: caddy
depends_on:
- openwebui
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config

volumes:
ollama:
openwebui:
caddy_data:
caddy_config:

Create a file named Caddyfile in the same folder. Replace your.domain.com with your real domain and the email with yours:

your.domain.com {
encode zstd gzip
tls [email protected]
# Optional Basic Auth — generate a hashed password below and uncomment
# basicauth {
# admin <paste_hashed_password_here>
# }
reverse_proxy openwebui:8080
}

If you want Basic Auth, generate a hash:

docker run --rm caddy:2 caddy hash-password --plaintext "StrongPassword!"

Copy the hash output, paste it into the Caddyfile under basicauth, and uncomment the lines.

Step 4 — Start the stack and pull a model

Start the services:

docker compose up -d

Pull a model with Ollama. Llama 3.1 is a great default; you can also choose smaller variants if you have less VRAM:

docker exec -it ollama ollama pull llama3.1

For low VRAM systems, try a quantized build like llama3.1:8b-instruct-q4_0 or a compact model like mistral:7b-instruct:

docker exec -it ollama ollama pull mistral:latest

Verify Ollama is up:

curl -s http://127.0.0.1:11434/api/tags

Step 5 — Access Open WebUI over HTTPS

Wait 30–60 seconds for Caddy to obtain a Let’s Encrypt certificate. Then browse to https://your.domain.com. On the first visit, create your Open WebUI admin user. In Settings > Models, select the model you pulled with Ollama. You can now chat privately with your local LLM through a friendly web interface.

Step 6 — Security hardening (recommended)

- Keep Open WebUI behind Caddy only. We already published it on localhost (127.0.0.1) to prevent direct exposure.

- Enable Basic Auth in your Caddyfile if you plan to expose the site to the open internet. Use a long, unique password.

- Restrict admin features in Open WebUI to your own account. Disable public sign-ups if you do not need them.

- Consider a firewall rule to allow inbound 80/443 only, and block 8080/11434 from the WAN.

Step 7 — Backups and updates

Back up Open WebUI data:

docker run --rm -v openwebui:/d -v $PWD:/b busybox tar czf /b/openwebui-backup.tgz -C /d .

Back up Ollama models (can be large):

docker run --rm -v ollama:/d -v $PWD:/b busybox tar czf /b/ollama-backup.tgz -C /d .

To update containers:

docker compose pull && docker compose up -d

To remove old images:

docker image prune -f

Troubleshooting

- Check logs if something fails to start: docker compose logs -f

- Verify DNS and port 80/443 reach the server; Let’s Encrypt must connect over HTTP/HTTPS the first time.

- If certificates fail, restart the stack after DNS propagates: docker compose down && docker compose up -d

- If the GPU is not detected, confirm nvidia-smi works on the host and that you added gpus: all under the Ollama service.

- Test the Ollama API locally: curl http://127.0.0.1:11434/api/generate -d '{"model":"llama3.1","prompt":"hi"}'

Where to go next

Explore model variants optimized for your hardware (Q4 for low VRAM, Q6/Q8 for higher quality, FP16 on strong GPUs). Add embeddings and RAG features in Open WebUI to chat over your documents. With this setup, you keep your data and traffic on your own server, with clean HTTPS, optional password protection, and fast local inference.

Comments