Local large language models (LLMs) are now practical for developers, researchers, and privacy-focused teams. In this step-by-step guide, you will install and run Ollama (LLM runtime) and Open WebUI (a modern chat interface) on Ubuntu 22.04/24.04, with optional NVIDIA GPU acceleration. The setup uses Docker for easy updates, isolation, and backups.
Why this stack?
Ollama makes downloading and running models simple, offering a fast API on your machine. Open WebUI provides a sleek, extensible web app for chatting with multiple models, managing prompts, and moderating access. Together, they create a private, cost-effective alternative to cloud AI services.
Prerequisites
You need an Ubuntu 22.04/24.04 system with internet access and a user with sudo rights. If you have an NVIDIA GPU (recommended), you can enable GPU acceleration for much faster inference. CPU-only works too—skip the GPU steps if you do not have a supported GPU.
Step 1: Update the system
Update packages to ensure you have the latest dependencies and security fixes.
sudo apt update && sudo apt -y upgrade
sudo reboot
Step 2 (Optional): Enable NVIDIA GPU support
Install the latest proprietary NVIDIA driver and the container toolkit so Docker can use your GPU. Reboot when asked.
# Install recommended NVIDIA driver
sudo ubuntu-drivers install
sudo reboot
# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list > /dev/null
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify GPU visibility
nvidia-smi
Step 3: Install Docker Engine and Docker Compose plugin
Install Docker from the official repository to get the latest stable version. Add your user to the docker group to run Docker without sudo.
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
# Quick test
docker run --rm hello-world
Step 4: Create a Docker Compose file for Ollama + Open WebUI
Create a working directory and define services. The configuration below enables GPU when present; for CPU-only, remove the gpus: all line under the Ollama service.
mkdir -p ~/local-llm && cd ~/local-llm
nano docker-compose.yml
services:
ollama:
container_name: ollama
image: ollama/ollama:latest
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
# Comment the next line if you are CPU-only
gpus: all
environment:
- OLLAMA_KEEP_ALIVE=24h
open-webui:
container_name: open-webui
image: ghcr.io/open-webui/open-webui:latest
depends_on:
- ollama
restart: unless-stopped
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_SECRET_KEY=change_this_long_random_secret
volumes:
- openwebui:/app/backend/data
volumes:
ollama:
openwebui:
Step 5: Start the stack
Bring the services up in the background, then check their status. The Open WebUI will be available at http://SERVER_IP:3000 and the Ollama API at http://SERVER_IP:11434.
docker compose up -d
docker compose ps
Step 6: Pull and run a model
Use Ollama to pull an LLM. You can pick models like llama3.2, mistral, or a coding model. The first pull downloads model weights, which can be several GB.
# Pull a general-purpose model
docker exec -it ollama ollama pull llama3.2
# Test it via CLI
docker exec -it ollama ollama run llama3.2 "Write a two-sentence summary of Ubuntu."
# Or use the API
curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello!"}'
Open your browser to http://SERVER_IP:3000, select the model from the dropdown, and start chatting. In Settings, you can change default models, system prompts, and appearance.
Step 7: Secure access
If exposing the WebUI beyond your LAN, add authentication. In Open WebUI, create an admin user at first login, then disable new signups in Settings. For internet exposure, place a reverse proxy (Nginx, Caddy, or Traefik) with HTTPS (Let’s Encrypt) in front of port 3000.
Step 8: Update and backup
To update, pull the latest images and recreate containers without losing data stored in volumes. To back up, save the volumes before upgrades.
# Update images
docker compose pull
docker compose up -d
# Backup volumes (example)
docker run --rm -v local-llm_ollama:/data -v "$PWD":/backup \
busybox tar czf /backup/ollama-vol.tgz /data
docker run --rm -v local-llm_openwebui:/data -v "$PWD":/backup \
busybox tar czf /backup/openwebui-vol.tgz /data
Troubleshooting
GPU not used: Run docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi. If it fails, recheck the driver and NVIDIA Container Toolkit steps. Ensure the gpus: all line is present and Docker was restarted.
Permission denied with Docker: You may need to log out and back in after adding your user to the docker group, or run newgrp docker.
Port conflicts: Change the left side of port mappings in docker-compose.yml (e.g., use "8081:8080" for WebUI) and restart.
Slow or failed model pull: Verify disk space and retry. Large models require several GB of free space in the ollama volume.
Uninstall (optional)
To stop and remove everything, run:
cd ~/local-llm
docker compose down
docker volume rm local-llm_ollama local-llm_openwebui
Wrap-up
You have a modern local AI stack: Ollama for fast model serving and Open WebUI for a friendly, multi-model chat interface. With Docker, updates are quick and backups are simple. Add your favorite models, tune system prompts, and integrate the Ollama API into your apps—all without sending data to the cloud.
Comments
Post a Comment