Overview
Running a local large language model (LLM) is easier than ever thanks to Ollama and Open WebUI. Ollama handles model downloads and inference, while Open WebUI gives you a clean, chat-style interface in your browser. In this tutorial, you'll install both on Ubuntu 24.04 (works on 22.04 too), enable NVIDIA GPU acceleration, and deploy them with Docker Compose. The result is a fast, private AI stack you control.
What You'll Need
- Ubuntu 24.04 or 22.04 (fresh or existing server/desktop).
- An NVIDIA GPU with recent drivers (Turing/RTX or newer recommended).
- Root or sudo access.
- Open ports 3000 (Open WebUI) and 11434 (Ollama) on your firewall if you access remotely.
Step 1: Install NVIDIA Drivers and Verify CUDA
First, update your system and install the recommended NVIDIA driver. On Ubuntu Desktop you can use Additional Drivers, but the CLI route is reliable:
sudo apt update && sudo apt -y upgradesudo ubuntu-drivers autoinstallsudo reboot
After reboot, confirm the GPU is visible:
nvidia-smi
If you see a driver table with your GPU, you're set. If not, re-run the install or check Secure Boot status (disable or enroll the MOK as needed).
Step 2: Install Docker, Compose, and NVIDIA Container Toolkit
Install Docker from the official repository so you get the latest engine and the Compose plugin:
sudo apt-get remove -y docker docker.io containerd runc || truesudo apt updatesudo apt install -y ca-certificates curl gnupgsudo install -m 0755 -d /etc/apt/keyringscurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpgecho "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt updatesudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginsudo usermod -aG docker $USERnewgrp docker
Add NVIDIA Container Toolkit so Docker can access the GPU:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpgdistribution=$(. /etc/os-release; echo $ID$VERSION_ID)curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt updatesudo apt install -y nvidia-container-toolkitsudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
Verify Docker can see the GPU:
docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi
Step 3: Create a Docker Compose File for Ollama and Open WebUI
Create a working directory and a docker-compose.yml:
mkdir -p ~/ollama-openwebui && cd ~/ollama-openwebuinano docker-compose.yml
Paste the following content, then save:
version: "3.9"services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped ports: - "11434:11434" volumes: - ollama:/root/.ollama environment: - OLLAMA_KEEP_ALIVE=24h gpus: all openwebui: image: ghcr.io/open-webui/open-webui:main container_name: open-webui restart: unless-stopped ports: - "3000:8080" environment: - OLLAMA_API_BASE_URL=http://ollama:11434 - ENABLE_AUTH=True volumes: - openwebui:/app/backend/data depends_on: - ollamavolumes: ollama: openwebui:
Bring everything up:
docker compose up -d
Open WebUI will be available at http://<your-server-ip>:3000 and Ollama's API at http://<your-server-ip>:11434.
Step 4: Pull a Model and Test Inference
Use Ollama to pull an LLM. Llama 3 8B is a good starting point if you have at least ~8–10 GB of free VRAM:
docker exec -it ollama ollama pull llama3:8b
You can test quickly from the CLI:
docker exec -it ollama ollama run llama3:8b
Or open your browser and navigate to Open WebUI (port 3000). Create an account on first visit, select the model, and start chatting. If GPU is being used, you should see activity in:
watch -n 1 nvidia-smi
Step 5: Secure and Maintain the Stack
- Firewall: Allow only needed ports (adjust to your network policy). For local-only use, block remote access to 3000/11434.
- Reverse proxy: For TLS and a friendly domain, put Nginx or Caddy in front of Open WebUI and obtain a Let's Encrypt certificate.
- Updates: Keep images fresh and restart the stack regularly:
docker compose pulldocker compose up -d
Back up volumes so you don't lose chats or downloaded models:
docker run --rm -v ollama:/data -v "$(pwd)":/backup alpine tar czf /backup/ollama-vol.tgz -C /data .docker run --rm -v openwebui:/data -v "$(pwd)":/backup alpine tar czf /backup/openwebui-vol.tgz -C /data .
Troubleshooting
- No GPU in containers: Confirm the toolkit is active. Check docker info | grep -i nvidia. Re-run sudo nvidia-ctk runtime configure --runtime=docker and restart Docker.
- Model out-of-memory (OOM): Use a smaller model or quantized variant (e.g., llama3:8b-instruct-q4_0). Close other GPU apps. You can also reduce context in Open WebUI settings.
- Slow generation: Ensure you're not falling back to CPU (watch nvidia-smi). Update drivers and Docker images. Use recent CUDA-compatible drivers (550+ often recommended).
- Open WebUI cannot reach Ollama: Check the environment OLLAMA_API_BASE_URL=http://ollama:11434. View logs with docker logs open-webui and docker logs ollama.
- Port conflicts: Change the host ports in docker-compose.yml (e.g., map "127.0.0.1:3000:8080" to bind only locally).
Where Models Are Stored and How to Clean Up
Models live in the Ollama volume (/root/.ollama inside the container). To list installed models:
docker exec -it ollama ollama list
Remove a model you no longer need:
docker exec -it ollama ollama rm llama3:8b
If you ever want to stop and remove the stack:
docker compose down
To reclaim space including volumes (this deletes your models and chat history), run:
docker compose down -v
Wrap-Up
You now have a private, GPU-accelerated LLM environment powered by Ollama and Open WebUI on Ubuntu. With Docker Compose, updates and maintenance are straightforward, and volumes keep your data persistent. From here, try different models (Mistral, Phi-3, Llama 3 Instruct), experiment with prompt templates, and fine-tune performance for your hardware. Enjoy your local AI workstation or server—no cloud required.
Comments
Post a Comment