Run Your Own Local AI Chat: Ollama + Open WebUI on Docker with NVIDIA or AMD GPU Acceleration

Overview

Running a private AI assistant on your own machine is now practical, fast, and secure. In this step-by-step guide, you will deploy Ollama (to run large language models locally) and Open WebUI (a clean chat interface) using Docker. The tutorial covers CPU-only mode and GPU acceleration with both NVIDIA and AMD ROCm, so you get the best performance out of your hardware. By the end, you will have a persistent setup that auto-starts on boot, supports one-click model management, and is easy to update.

Prerequisites

- A 64-bit Linux host (Ubuntu 22.04+ recommended) with internet access.
- Docker Engine installed and running.
- For NVIDIA GPUs: proprietary drivers and the NVIDIA Container Toolkit.
- For AMD GPUs: ROCm-capable GPU with ROCm drivers (5.7+).
- Open ports 11434 (Ollama API) and 3000 (Open WebUI). You can change ports if they collide with other services.

Step 1 — Install Docker (Ubuntu quick method)

If Docker is not installed, run:

sudo apt-get update && sudo apt-get install -y ca-certificates curl gnupg sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update && sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Step 2 — Enable GPU in Docker (optional but recommended)

NVIDIA: Install the NVIDIA driver and container toolkit, then restart Docker:

sudo apt-get install -y nvidia-driver-535 (or newer)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker

AMD ROCm: Install ROCm drivers appropriate for your GPU. Ensure devices /dev/kfd and /dev/dri exist. No extra Docker runtime is needed; you will pass devices to the container.

Step 3 — Create a dedicated Docker network

Create a private bridge network so containers can discover each other by name:

docker network create ai

Step 4 — Start Ollama (choose one)

NVIDIA GPU:

docker run -d --name ollama --gpus=all --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest

AMD ROCm GPU:

docker run -d --name ollama --device /dev/kfd --device /dev/dri --group-add video --ipc=host --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:rocm

CPU-only:

docker run -d --name ollama --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest

Step 5 — Start Open WebUI and connect it to Ollama

Run Open WebUI on port 3000 and point it at the Ollama API using the container name:

docker run -d --name open-webui --restart unless-stopped --network ai -p 3000:8080 -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://ollama:11434 open-webui/open-webui:latest

Open a browser and visit http://YOUR_SERVER_IP:3000. On first launch, create your admin account. In Settings, disable public sign-ups if this instance is exposed to untrusted networks.

Step 6 — Download a model and chat

You can pull models from Open WebUI’s UI. Or pull via CLI:

docker exec -it ollama ollama pull llama3.1:8b

For lower VRAM, try quantized variants (for example: llama3.1:8b-instruct-q4_0). After the download completes, select the model in Open WebUI and start chatting locally.

Security and access tips

- Keep ports private if possible; bind to localhost and use an authenticated reverse proxy (Nginx, Caddy, or Traefik) if exposing to the internet.
- Regularly update images and disable open registration in Open WebUI.
- Use Docker volumes (already configured) to persist models and settings across updates.

Updating to the latest versions

docker pull ollama/ollama:latest (or ollama/ollama:rocm for AMD)
docker pull open-webui/open-webui:latest
docker stop open-webui ollama && docker rm open-webui ollama
Re-run the docker run commands from Steps 4–5. Your data persists in volumes.

Troubleshooting

- GPU not detected (NVIDIA): test with docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi.
- GPU not detected (AMD): ensure /dev/kfd and /dev/dri exist and your user is in the video group (the container already adds it).
- Slow or failed model pulls: check DNS/proxy and rerun the pull. Try a smaller or more quantized model.
- Port conflicts: change -p 11434:11434 or -p 3000:8080 to other free host ports.

Uninstall and cleanup

docker rm -f open-webui ollama
docker volume rm open-webui ollama
docker network rm ai

You now have a flexible, private AI stack that runs entirely on your machine. Swap models as needed, tune quantization for your hardware, and keep your data under your control.

LifeBytes Journal

Search This Blog