Overview
This tutorial shows how to deploy a private, fast, and GPU-accelerated local AI chat using Ollama and Open WebUI on Ubuntu 24.04. Ollama manages large language models (LLMs) such as Llama 3, while Open WebUI provides a friendly web interface for chatting, prompt management, and basic workflow tools. We will use Docker to keep the setup clean and reproducible, with steps for both CPU-only and NVIDIA GPU acceleration.
Prerequisites
OS: Ubuntu 24.04 (also works on 22.04). Access: sudo-enabled user. Hardware: 8 GB RAM minimum for small models; NVIDIA GPU (optional) with at least 8 GB VRAM for faster inference. Network: Internet access to pull images and models.
Step 1 — Install Docker and basic tools
Update the system and install Docker using the convenience script. Then add your user to the docker group so you can run containers without sudo.sudo apt update && sudo apt -y upgradesudo apt -y install curl ca-certificates gnupg lsb-releasecurl -fsSL https://get.docker.com | shsudo usermod -aG docker $USER && newgrp docker
Step 2 — (Optional) Enable NVIDIA GPU acceleration
If you have an NVIDIA GPU, install the recommended driver, then the NVIDIA Container Toolkit so Docker can use your GPU.
Install driver:sudo ubuntu-drivers autoinstall && sudo reboot
After reboot, verify:nvidia-smi
Install NVIDIA Container Toolkit:sudo bash -c 'curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg'sudo bash -c 'curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed "s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#" > /etc/apt/sources.list.d/nvidia-container-toolkit.list'sudo apt update && sudo apt -y install nvidia-container-toolkitsudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
Step 3 — Create a dedicated Docker network and volumes
Networking and volumes keep services isolated and data persistent.docker network create aidocker volume create ollamadocker volume create open-webui
Step 4 — Start Ollama (CPU or GPU)
CPU-only:docker run -d --name ollama --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest
GPU-enabled (NVIDIA):docker run -d --name ollama --restart unless-stopped --network ai -p 11434:11434 --gpus all -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -v ollama:/root/.ollama ollama/ollama:latest
Step 5 — Download a model
Pull a model inside the Ollama container. Llama 3 8B works well on modern CPUs and mid-range GPUs; Mistral 7B is another good option.docker exec -it ollama ollama pull llama3:8b
Alternative models: docker exec -it ollama ollama pull mistral:7b
Step 6 — Launch Open WebUI
Run the Open WebUI container and point it to the Ollama service via the Docker network.docker run -d --name open-webui --restart unless-stopped --network ai -p 3000:8080 -e OLLAMA_BASE_URL=http://ollama:11434 -e OLLAMA_API_BASE_URL=http://ollama:11434 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:latest
Open a browser to http://SERVER_IP:3000. Create your admin account on first login. In Settings, select the downloaded model (for example, llama3:8b) and start chatting.
Security hardening
If you only need local access, bind ports to localhost by replacing -p 3000:8080 and -p 11434:11434 with -p 127.0.0.1:3000:8080 and -p 127.0.0.1:11434:11434. For remote access, put a reverse proxy like Nginx or Caddy in front with HTTPS and authentication. Also consider firewall rules to restrict inbound connections to required IPs.
Model and app updates
Update images and models periodically. Pull latest images and recreate containers:docker pull ollama/ollama:latest && docker pull ghcr.io/open-webui/open-webui:latestdocker stop open-webui ollama && docker rm open-webui ollama<re-run the docker run commands from above>
Update models as needed:docker exec -it ollama ollama pull llama3:8b
Backup and restore
Backup persistent data via volumes. From a safe directory:docker run --rm -v ollama:/data -v $PWD:/backup alpine tar czf /backup/ollama-models.tgz -C /data .docker run --rm -v open-webui:/data -v $PWD:/backup alpine tar czf /backup/open-webui-data.tgz -C /data .
Restore by creating volumes and extracting archives back into them using similar commands.
Troubleshooting
GPU not detected in containers: Ensure nvidia-smi works on the host, the NVIDIA Container Toolkit is installed, and you used --gpus all. Restart Docker after configuring the toolkit. Verify with docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi.
Port conflicts: If ports 3000 or 11434 are in use, change the host ports (e.g., -p 3001:8080).
Slow performance: Prefer GPU models when available, close background apps, and choose smaller models like llama3:8b or mistral:7b. For CPU, disable power saving and use a recent CPU with AVX2.
Clean removal: Stop and remove everything with:docker rm -f open-webui ollamadocker volume rm open-webui ollamadocker network rm ai
What you built
You now have a modern, private AI stack running locally: Ollama serving LLMs and Open WebUI delivering a polished chat interface. It is portable via Docker, secure when bound to localhost or proxied with TLS, and easy to update. This setup is ideal for prototyping prompts, experimenting with different models, and keeping sensitive data on your own machine.
Comments
Post a Comment