Overview
This step-by-step guide shows how to deploy a private, local AI stack on Ubuntu using Docker: Ollama for running large language models (LLMs) and Open WebUI as a fast, friendly chat interface. The setup works on CPUs and supports NVIDIA GPUs for acceleration. You will get a secure, self-hosted environment where you can run models like Llama 3.2, Phi-4, and Mistral without sending data to the cloud.
Prerequisites
- Ubuntu 22.04 or 24.04 (server or desktop)
- 16 GB RAM recommended (more for larger models), 30+ GB free disk
- Docker Engine and the Docker Compose plugin
- Optional: NVIDIA GPU with proprietary driver installed (e.g., 535+)
1) Install Docker and Docker Compose
Update your system and install Docker from the official repository for best stability and performance.
sudo apt update && sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER && newgrp docker
2) (Optional) Enable NVIDIA GPU in Containers
If you have an NVIDIA GPU, install the NVIDIA Container Toolkit so Docker can pass the GPU into containers:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Verify your driver with nvidia-smi. The container will get GPU access when you run it with --gpus all.
3) Create a Docker Network and Volumes
Create a dedicated network so services can talk by name and set up persistent storage:
docker network create llmnet
docker volume create ollama
docker volume create open-webui
4) Run the Ollama Container
Start Ollama. For CPU-only:
docker run -d --name ollama --restart=unless-stopped --network llmnet -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest
With NVIDIA GPU acceleration (detected automatically):
docker run -d --name ollama --restart=unless-stopped --network llmnet --gpus all -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest
5) Pull a Model
You can manage models from the host via docker exec. Pull a lightweight model to start quickly:
docker exec -it ollama ollama pull llama3.2:3b
Test generation from the command line:
curl http://localhost:11434/api/generate -d '{"model":"llama3.2:3b","prompt":"Say hello in one short line."}'
6) Launch Open WebUI
Open WebUI provides a clean chat interface and model manager. Start it on port 3000 and point it to the Ollama endpoint:
docker run -d --name open-webui --restart=unless-stopped --network llmnet -p 3000:8080 -e OLLAMA_BASE_URL=http://ollama:11434 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:latest
Open a browser and visit http://localhost:3000 (or your server IP). Create the first admin account, select a model (e.g., llama3.2:3b), and start chatting. If a model is missing, Open WebUI can pull it automatically via Ollama.
7) Optional: Use Docker Compose
Prefer to keep everything in a single file? Create docker-compose.yml in an empty folder and paste:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
networks: [llmnet]
volumes:
- ollama:/root/.ollama
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
networks: [llmnet]
volumes:
- open-webui:/app/backend/data
networks:
llmnet:
external: true
volumes:
ollama:
open-webui:
Start with docker compose up -d. For GPU, prefer the docker run method with --gpus all, or adapt your Compose file using a GPU-capable configuration on your system.
8) Securing and Updating
- Restrict access: if running on a server, firewall ports 11434 and 3000 to trusted IPs.
- Reverse proxy: place Nginx or Caddy in front with HTTPS for remote access.
- Updates: pull newer images and recreate containers: docker pull ollama/ollama:latest && docker pull ghcr.io/open-webui/open-webui:latest, then docker stop and docker rm containers and re-run them. Your data persists in the volumes.
9) Troubleshooting
- Check logs: docker logs -f ollama and docker logs -f open-webui.
- Port in use: change published ports (e.g., -p 3001:8080).
- GPU not detected: validate nvidia-smi, reinstall the NVIDIA Container Toolkit, and ensure --gpus all is present.
- Disk space: models are large; prune unused data with docker system prune and remove models in ollama volume if needed.
10) Quick API and CLI Examples
- Pull another model: docker exec -it ollama ollama pull phi4:latest
- Chat from CLI: docker exec -it ollama ollama run mistral:7b
- Simple REST call: curl http://localhost:11434/api/generate -d '{"model":"phi4:latest","prompt":"Give me two bullet points about container security."}'
You now have a modern, private AI stack using Docker, Ollama, and Open WebUI on Ubuntu. It is fast, flexible, and ready for local development, internal knowledge assistants, and offline experimentation—no cloud required.
Comments
Post a Comment