Run Local AI with Ollama and Open WebUI: GPU-Accelerated Setup on Windows and Linux with Docker

Local large language models (LLMs) have matured to the point where you can run fast, private, and cost-effective AI on your own computer or server. In this step-by-step guide, you will deploy Ollama (the LLM backend) and Open WebUI (a sleek web interface) using Docker, with optional GPU acceleration on both Windows and Linux. This stack lets you chat with models like Llama 3, Phi-4, or Mistral, completely on your hardware.

By the end, you will have a browser-based interface, persistent model storage, and a clean way to update or back up your local AI environment. The instructions are written in simple, SEO-friendly language and focus on practical steps.

What You Will Build

You will run two containers on the same Docker network: Ollama exposes an API on port 11434 and performs all model work, while Open WebUI listens on port 3000 and connects to Ollama. You will enable GPU acceleration (NVIDIA or AMD) when available to dramatically improve performance.

Prerequisites

- A 64-bit Windows 11/10 (with WSL2) or a modern Linux distribution (Ubuntu/Debian/CentOS/RHEL).
- Docker installed (Docker Desktop on Windows, Docker Engine on Linux).
- Optional GPU: NVIDIA (CUDA) or AMD (ROCm) with up-to-date drivers. CPU-only also works, but is slower.
- 16 GB RAM recommended; disk space 10–40+ GB depending on model size.

Step 1 – Install Docker

Windows: Install Docker Desktop and enable WSL 2 integration. In Settings, ensure “Use the WSL 2 based engine” is on. Update your GPU driver from NVIDIA/AMD. For NVIDIA, CUDA is not required on Windows for Docker Desktop; the latest Game Ready/Studio drivers are enough.

Linux: Install Docker from your distribution’s repository or Docker’s official repo. Add your user to the docker group and log out/in. Example (Ubuntu):

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER

Step 2 – Enable GPU Acceleration (Optional but Recommended)

NVIDIA on Linux: Install the NVIDIA Container Toolkit to pass your GPU into containers.

# Add the NVIDIA container toolkit repo (Ubuntu example)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

AMD on Linux (ROCm): Install the latest AMDGPU/ROCm stack. To give containers access, pass /dev/kfd and /dev/dri and add the video group. Example device flags are shown in the Ollama run step below.

Windows: Docker Desktop exposes the GPU automatically when the host has a compatible driver. Ensure your GPU driver is up to date and “Use the WSL 2 based engine” is enabled.

Step 3 – Start Ollama (LLM Backend)

Create a Docker network and a persistent volume for models. Then start the Ollama container. Use the NVIDIA command if you have an NVIDIA GPU; use the AMD/CPU command otherwise.

# Common network and volumes
docker network create llmnet
docker volume create ollama

# NVIDIA GPU (Linux or Windows with Docker Desktop)
docker run -d --name ollama \
  --network llmnet \
  --gpus=all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama:latest

# AMD ROCm or CPU-only (Linux)
# Remove the two --device flags if you want CPU-only
docker run -d --name ollama \
  --network llmnet \
  --device=/dev/kfd --device=/dev/dri --group-add video \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama:latest

Verify Ollama is live:

curl http://localhost:11434/api/tags
# or
docker logs -f ollama

Step 4 – Start Open WebUI (Front-End)

Open WebUI connects to the Ollama API and gives you a beautiful chat interface. Map port 3000 for access and point it to the Ollama container over the private network.

docker volume create open-webui

docker run -d --name open-webui \
  --network llmnet \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://ollama:11434 \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:latest

Open your browser at http://localhost:3000 and follow the first-run prompts. If you are on a server, replace localhost with the server’s IP or hostname.

Step 5 – Pull and Test a Model

Use Ollama to download a model. Smaller 7–8B models are a good starting point. You can pull directly from the container or from the WebUI Models page.

# Examples (choose one)
docker exec -it ollama ollama pull llama3.1:8b
docker exec -it ollama ollama pull phi3:mini
docker exec -it ollama ollama pull mistral:7b

After the download, open Open WebUI and start a new chat. Pick the model you pulled and send a test prompt. If you see fast tokens and low latency, your GPU is active. If generation is slow, you may be on CPU.

Step 6 – Secure, Persist, and Back Up

Enable authentication in Open WebUI and control who can sign up. You can preconfigure basic auth behavior with environment variables. Example: disable new signups and set an admin email.

# Stop and re-create Open WebUI with tighter auth (example)
docker rm -f open-webui
docker run -d --name open-webui \
  --network llmnet \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://ollama:11434 \
  -e ENABLE_SIGNUP=false \
  -e [email protected] \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:latest

To back up models and chat history, archive the Docker volumes. This keeps your setup portable.

# Backup Ollama models
docker run --rm -v ollama:/data -v "$PWD":/backup alpine \
  tar czf /backup/ollama-volume-backup.tgz -C /data .

# Backup Open WebUI data
docker run --rm -v open-webui:/data -v "$PWD":/backup alpine \
  tar czf /backup/open-webui-volume-backup.tgz -C /data .

To update, pull the latest images and recreate:

docker pull ollama/ollama:latest
docker pull ghcr.io/open-webui/open-webui:latest
docker rm -f open-webui ollama
# Re-run the "docker run" commands from Steps 3 and 4

Performance Tips

- Prefer smaller, quantized models (e.g., 7–8B) if you have limited VRAM. Many Ollama models include quantized tags that fit 8–12 GB GPUs.
- Close other GPU-heavy apps to free VRAM.
- Keep GPU drivers and Docker updated for the best kernel-accelerated performance.

Troubleshooting

Open WebUI cannot reach Ollama: Make sure both containers share the same network and the URL is correct: http://ollama:11434. Run docker logs open-webui for connection errors.

“no gpus found” or slow generation: On Linux with NVIDIA, confirm nvidia-smi works on the host and that nvidia-container-toolkit is installed. Run the container with --gpus=all. On AMD, pass --device=/dev/kfd --device=/dev/dri --group-add video. On Windows, ensure Docker Desktop is using WSL2 and that your GPU driver is current.

Port already in use: Adjust published ports, e.g., use -p 3001:8080 or -p 11435:11434, and update the URLs accordingly.

Out of memory (VRAM): Pick a smaller or more heavily quantized model. Close other GPU apps and try again.

What’s Next

With Ollama and Open WebUI running, you can add multiple models, enable embeddings and RAG, or connect tools and function calling. This setup gives you a private, fast local AI workspace that you can back up and upgrade in minutes—all without sending your data to the cloud.

LifeBytes Journal

Search This Blog