Overview
This guide shows you how to deploy Ollama and Open WebUI on Ubuntu using Docker Compose, with optional NVIDIA GPU acceleration and automatic HTTPS. You will get a clean, reproducible setup suitable for a home lab, a developer VM, or a small on-prem server. The steps are focused on Ubuntu 22.04/24.04 LTS, but will work on other modern distributions with minor changes.
What You Will Build
You will run three containers: Ollama (LLM runtime), Open WebUI (a friendly web front end), and Caddy (a reverse proxy that issues and renews free TLS certificates). Data will persist in Docker volumes so updates and restarts do not wipe your models or chat history.
Prerequisites
1) An Ubuntu server with at least 16 GB RAM recommended for medium models (more is better). 2) A domain or subdomain (e.g., ai.example.com) pointed to your server’s public IP (A/AAAA record). 3) Ports 80 and 443 open to the Internet. 4) Optional: an NVIDIA GPU with recent drivers for acceleration. 5) A non-root user with sudo.
Step 1 — Install Docker and Compose
Update the OS and install Docker Engine and the Compose plugin from Docker’s repository:
sudo apt update && sudo apt upgrade -ysudo apt install -y ca-certificates curl gnupgsudo install -m 0755 -d /etc/apt/keyringscurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpgecho "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt updatesudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginsudo usermod -aG docker $USER && newgrp docker
Step 2 — (Optional) Enable NVIDIA GPU for Containers
If you have an NVIDIA GPU, install the driver and the NVIDIA Container Toolkit so Ollama can use CUDA.
Install drivers: sudo ubuntu-drivers autoinstall, then reboot. Verify with nvidia-smi.
Install the container toolkit:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && \curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpgcurl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt update && sudo apt install -y nvidia-container-toolkitsudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
Step 3 — Prepare the Project
Create a directory for your stack and move into it:
mkdir -p ~/ollama-stack && cd ~/ollama-stack
We will create a docker-compose.yml and a Caddyfile. Replace ai.example.com and your email as needed.
Step 4 — Docker Compose File
Create docker-compose.yml with the content below. If you have a GPU, keep the deploy.resources.reservations.devices section; otherwise you can remove it.
services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped ports: - "11434:11434" volumes: - ollama_data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] openwebui: image: ghcr.io/open-webui/open-webui:latest container_name: openwebui restart: unless-stopped environment: - OLLAMA_BASE_URL=http://ollama:11434 depends_on: - ollama volumes: - openwebui_data:/app/backend/data caddy: image: caddy:latest container_name: caddy restart: unless-stopped ports: - "80:80" - "443:443" volumes: - ./Caddyfile:/etc/caddy/Caddyfile:ro - caddy_data:/data - caddy_config:/config depends_on: - openwebuivolumes: ollama_data: openwebui_data: caddy_data: caddy_config:
Step 5 — Caddy Reverse Proxy
Create Caddyfile with your domain. Caddy will automatically issue and renew a Let’s Encrypt certificate and proxy traffic to Open WebUI.
ai.example.com { encode gzip reverse_proxy openwebui:8080}
Ensure your DNS A/AAAA record points to the server before continuing. If you only need local access, you can skip Caddy and access Open WebUI on http://SERVER_IP:8080 by publishing that port; however, TLS is strongly recommended.
Step 6 — Launch the Stack
Start everything with Docker Compose:
docker compose up -d
Watch the logs for any errors, especially domain or certificate issues:
docker compose logs -f caddy
After a minute, visit https://ai.example.com and complete the initial Open WebUI setup. In Settings, verify the Ollama endpoint is http://ollama:11434 (it should be pre-set from the environment variable).
Step 7 — Pull a Model and Test
You can pull and manage models via the Open WebUI interface, or via the CLI inside the Ollama container:
docker exec -it ollama ollama pull llama3.1docker exec -it ollama ollama run llama3.1
If you enabled GPU support, Ollama should automatically leverage CUDA. You can confirm GPU usage with nvidia-smi while running a prompt.
Security and Hardening Tips
- Create an admin user in Open WebUI and do not expose the Ollama port 11434 to the Internet unless you really need the API externally. In the Compose file above, only Caddy is published publicly on 80/443, which is safer.
- Restrict access by IP or add basic auth in Caddy if you want a quick gate. Example inside your site block: basicauth { user JDJhJDEw$... } (generate hashes with caddy hash-password).
- Keep images updated: docker compose pull && docker compose up -d. Consider enabling automatic re-deploys on a schedule.
Performance Hints
- Use models that fit your VRAM/RAM. Smaller models like q4_K_M quantizations work well on modest GPUs and CPUs. For CPU-only servers, prefer 7B or smaller models.
- Set swap if RAM is tight: sudo fallocate -l 16G /swapfile && sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile. Add to /etc/fstab for persistence.
- Place Docker volumes on fast storage (NVMe) for quicker model load times. You can bind-mount a directory like ./ollama:/root/.ollama if you prefer easy backups.
Backup and Restore
Back up the volumes for Ollama and Open WebUI to keep models and chat history. Example quick backup of models:
docker run --rm -v ollama_data:/data -v $(pwd):/backup alpine tar czf /backup/ollama-data.tgz -C /data .
Repeat similarly for openwebui_data. To restore, reverse the process by untarring into an identically named volume.
Troubleshooting
- If Caddy fails to get a certificate, verify your DNS record, that ports 80/443 are reachable, and no other service (like another web server) is binding them.
- If GPU is not detected, confirm nvidia-smi works on the host and that the nvidia-container-toolkit is installed. Restart Docker and the containers after changes.
- If Open WebUI cannot reach Ollama, ensure the environment variable points to http://ollama:11434 and that both containers share the same default network (they do in this Compose file).
Conclusion
You now have a production-grade, self-hosted LLM stack with Ollama and Open WebUI, managed by Docker Compose and protected by automatic HTTPS via Caddy. This setup is easy to maintain, portable across servers, and ready for experimentation or internal use. With GPU acceleration, you can serve sophisticated models efficiently; without a GPU, you can still run smaller quantized models for private inference. Keep your containers updated, monitor resource usage, and iterate on models that best fit your hardware and use cases.
Comments
Post a Comment