Overview
This guide shows how to self-host a private AI chatbot with Open WebUI (a clean, ChatGPT-like interface) and Ollama (for running local large language models) on Ubuntu 22.04 or 24.04. Everything runs in Docker, secured with HTTPS via Caddy and optional Basic Auth. If you have an NVIDIA GPU, you can enable GPU acceleration to speed up model inference dramatically.
What you will need
- An Ubuntu 22.04/24.04 server with at least 8 GB RAM and 20 GB free disk space. For GPU acceleration, an NVIDIA GPU with recent drivers is recommended (e.g., 8 GB VRAM or more for larger models).
- A domain name pointing to your server’s public IP (A/AAAA record). Ports 80 and 443 should be open to the internet for Let’s Encrypt.
- A non-root user with sudo privileges.
Step 1 — Install Docker and Docker Compose plugin
Update your system and install Docker from the official repository:
sudo apt update && sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER && newgrp docker
Step 2 — (Optional) Enable NVIDIA GPU for containers
Install the NVIDIA driver (if not already installed) and the NVIDIA container toolkit so Docker can access your GPU.
sudo ubuntu-drivers install (or choose a specific driver, e.g., sudo apt install -y nvidia-driver-535)
sudo reboot
Install the container toolkit:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Test GPU access:
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
Step 3 — Prepare Docker Compose and Caddy
Create a project folder and move into it:
mkdir -p ~/ai-stack && cd ~/ai-stack
Create a file named docker-compose.yml with the following content (replace your.domain.com later in Caddyfile):
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=2h
ports:
- "127.0.0.1:11434:11434"
restart: unless-stopped
# Uncomment the next line if you enabled NVIDIA toolkit
# gpus: all
openwebui:
image: ghcr.io/open-webui/open-webui:main
container_name: openwebui
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=True
ports:
- "127.0.0.1:8080:8080"
volumes:
- openwebui:/app/backend/data
restart: unless-stopped
caddy:
image: caddy:2
container_name: caddy
depends_on:
- openwebui
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config
volumes:
ollama:
openwebui:
caddy_data:
caddy_config:
Create a file named Caddyfile in the same folder. Replace your.domain.com with your real domain and the email with yours:
your.domain.com {
encode zstd gzip
tls [email protected]
# Optional Basic Auth — generate a hashed password below and uncomment
# basicauth {
# admin <paste_hashed_password_here>
# }
reverse_proxy openwebui:8080
}
If you want Basic Auth, generate a hash:
docker run --rm caddy:2 caddy hash-password --plaintext "StrongPassword!"
Copy the hash output, paste it into the Caddyfile under basicauth, and uncomment the lines.
Step 4 — Start the stack and pull a model
Start the services:
docker compose up -d
Pull a model with Ollama. Llama 3.1 is a great default; you can also choose smaller variants if you have less VRAM:
docker exec -it ollama ollama pull llama3.1
For low VRAM systems, try a quantized build like llama3.1:8b-instruct-q4_0 or a compact model like mistral:7b-instruct:
docker exec -it ollama ollama pull mistral:latest
Verify Ollama is up:
curl -s http://127.0.0.1:11434/api/tags
Step 5 — Access Open WebUI over HTTPS
Wait 30–60 seconds for Caddy to obtain a Let’s Encrypt certificate. Then browse to https://your.domain.com. On the first visit, create your Open WebUI admin user. In Settings > Models, select the model you pulled with Ollama. You can now chat privately with your local LLM through a friendly web interface.
Step 6 — Security hardening (recommended)
- Keep Open WebUI behind Caddy only. We already published it on localhost (127.0.0.1) to prevent direct exposure.
- Enable Basic Auth in your Caddyfile if you plan to expose the site to the open internet. Use a long, unique password.
- Restrict admin features in Open WebUI to your own account. Disable public sign-ups if you do not need them.
- Consider a firewall rule to allow inbound 80/443 only, and block 8080/11434 from the WAN.
Step 7 — Backups and updates
Back up Open WebUI data:
docker run --rm -v openwebui:/d -v $PWD:/b busybox tar czf /b/openwebui-backup.tgz -C /d .
Back up Ollama models (can be large):
docker run --rm -v ollama:/d -v $PWD:/b busybox tar czf /b/ollama-backup.tgz -C /d .
To update containers:
docker compose pull && docker compose up -d
To remove old images:
docker image prune -f
Troubleshooting
- Check logs if something fails to start: docker compose logs -f
- Verify DNS and port 80/443 reach the server; Let’s Encrypt must connect over HTTP/HTTPS the first time.
- If certificates fail, restart the stack after DNS propagates: docker compose down && docker compose up -d
- If the GPU is not detected, confirm nvidia-smi works on the host and that you added gpus: all under the Ollama service.
- Test the Ollama API locally: curl http://127.0.0.1:11434/api/generate -d '{"model":"llama3.1","prompt":"hi"}'
Where to go next
Explore model variants optimized for your hardware (Q4 for low VRAM, Q6/Q8 for higher quality, FP16 on strong GPUs). Add embeddings and RAG features in Open WebUI to chat over your documents. With this setup, you keep your data and traffic on your own server, with clean HTTPS, optional password protection, and fast local inference.
Comments
Post a Comment