Overview
Running a private AI assistant on your own machine is now practical, fast, and secure. In this step-by-step guide, you will deploy Ollama (to run large language models locally) and Open WebUI (a clean chat interface) using Docker. The tutorial covers CPU-only mode and GPU acceleration with both NVIDIA and AMD ROCm, so you get the best performance out of your hardware. By the end, you will have a persistent setup that auto-starts on boot, supports one-click model management, and is easy to update.
Prerequisites
- A 64-bit Linux host (Ubuntu 22.04+ recommended) with internet access.
- Docker Engine installed and running.
- For NVIDIA GPUs: proprietary drivers and the NVIDIA Container Toolkit.
- For AMD GPUs: ROCm-capable GPU with ROCm drivers (5.7+).
- Open ports 11434 (Ollama API) and 3000 (Open WebUI). You can change ports if they collide with other services.
Step 1 — Install Docker (Ubuntu quick method)
If Docker is not installed, run:
sudo apt-get update && sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update && sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Step 2 — Enable GPU in Docker (optional but recommended)
NVIDIA: Install the NVIDIA driver and container toolkit, then restart Docker:
sudo apt-get install -y nvidia-driver-535 (or newer)distribution=$(. /etc/os-release;echo $ID$VERSION_ID)curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpgcurl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt-get update && sudo apt-get install -y nvidia-container-toolkitsudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker
AMD ROCm: Install ROCm drivers appropriate for your GPU. Ensure devices /dev/kfd and /dev/dri exist. No extra Docker runtime is needed; you will pass devices to the container.
Step 3 — Create a dedicated Docker network
Create a private bridge network so containers can discover each other by name:
docker network create ai
Step 4 — Start Ollama (choose one)
NVIDIA GPU:
docker run -d --name ollama --gpus=all --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest
AMD ROCm GPU:
docker run -d --name ollama --device /dev/kfd --device /dev/dri --group-add video --ipc=host --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:rocm
CPU-only:
docker run -d --name ollama --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest
Step 5 — Start Open WebUI and connect it to Ollama
Run Open WebUI on port 3000 and point it at the Ollama API using the container name:
docker run -d --name open-webui --restart unless-stopped --network ai -p 3000:8080 -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://ollama:11434 open-webui/open-webui:latest
Open a browser and visit http://YOUR_SERVER_IP:3000. On first launch, create your admin account. In Settings, disable public sign-ups if this instance is exposed to untrusted networks.
Step 6 — Download a model and chat
You can pull models from Open WebUI’s UI. Or pull via CLI:
docker exec -it ollama ollama pull llama3.1:8b
For lower VRAM, try quantized variants (for example: llama3.1:8b-instruct-q4_0). After the download completes, select the model in Open WebUI and start chatting locally.
Security and access tips
- Keep ports private if possible; bind to localhost and use an authenticated reverse proxy (Nginx, Caddy, or Traefik) if exposing to the internet.
- Regularly update images and disable open registration in Open WebUI.
- Use Docker volumes (already configured) to persist models and settings across updates.
Updating to the latest versions
docker pull ollama/ollama:latest (or ollama/ollama:rocm for AMD)docker pull open-webui/open-webui:latestdocker stop open-webui ollama && docker rm open-webui ollama
Re-run the docker run commands from Steps 4–5. Your data persists in volumes.
Troubleshooting
- GPU not detected (NVIDIA): test with docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi.
- GPU not detected (AMD): ensure /dev/kfd and /dev/dri exist and your user is in the video group (the container already adds it).
- Slow or failed model pulls: check DNS/proxy and rerun the pull. Try a smaller or more quantized model.
- Port conflicts: change -p 11434:11434 or -p 3000:8080 to other free host ports.
Uninstall and cleanup
docker rm -f open-webui ollamadocker volume rm open-webui ollamadocker network rm ai
You now have a flexible, private AI stack that runs entirely on your machine. Swap models as needed, tune quantization for your hardware, and keep your data under your control.
3.
Comments
Post a Comment