Deploy Ollama and Open WebUI on Windows 11 with WSL2 and NVIDIA GPU Acceleration (2025 Guide)

Overview

This step-by-step guide shows how to run local large language models with GPU acceleration on a Windows 11 machine using WSL2 (Ubuntu), Ollama, and Open WebUI. You will install WSL, enable NVIDIA GPU pass-through, run Ollama to host models like Llama 3.1, and add a friendly browser interface via Open WebUI. The result is a fast, private AI workstation with minimal overhead.

Prerequisites

You need Windows 11 with WSL2 enabled, an NVIDIA GPU with a recent driver (version 555 or newer recommended), and at least 16 GB RAM. If you use Docker Desktop you can keep it installed, but this guide keeps things simple by running both Ollama and Open WebUI directly in WSL.

1) Install WSL2 and Ubuntu

Open PowerShell as Administrator and run: wsl --install -d Ubuntu-24.04. Reboot if asked, then finish the Ubuntu setup (username and password). Inside Ubuntu, update packages with: sudo apt update && sudo apt upgrade -y. This gives you a clean base Linux environment.

2) Enable GPU for WSL

Install the latest NVIDIA Windows driver from nvidia.com that supports CUDA in WSL. After installation, reboot Windows. You do not usually need the full CUDA toolkit inside WSL; the Windows driver exposes the runtime. If a model later fails to use the GPU, install minimal CUDA userspace as a fallback: sudo apt install -y nvidia-cuda-toolkit.

3) Turn on systemd in WSL (recommended)

Enabling systemd lets you manage services cleanly. In Ubuntu, run: sudo nano /etc/wsl.conf and add: [boot] on one line and systemd=true on the next. Save and exit. In PowerShell run: wsl --shutdown, then start Ubuntu again from the Start menu. Verify with: systemctl --version (it should show a version, not an error).

4) Install Ollama

In Ubuntu, install Ollama with: curl -fsSL https://ollama.com/install.sh | sh. If systemd is enabled, start and enable the service: sudo systemctl enable --now ollama. Otherwise run it manually in the background: nohup ollama serve >/dev/null 2>&1 &. Ollama listens on port 11434 by default.

5) Pull a model and test

Download a capable model such as Llama 3.1 8B by running: ollama pull llama3.1:8b. When it completes, test a quick prompt: ollama run llama3.1:8b "Write a haiku about WSL2." To confirm GPU usage, keep a Windows terminal open with nvidia-smi and watch for activity while the prompt runs. If you prefer to force CPU temporarily, set: OLLAMA_NO_GPU=1 before the command.

6) Install Open WebUI (no Docker required)

Open WebUI provides a clean browser interface for Ollama. Install Python and pip if needed: sudo apt install -y python3-pip python3-venv. Then install Open WebUI system-wide: pipx works great, but pip also works. For simplicity run: pip install --upgrade pip && pip install open-webui. Start it with: open-webui serve --host 0.0.0.0 --port 3000. If the command is not found, try: python3 -m open_webui serve --host 0.0.0.0 --port 3000.

7) Connect Open WebUI to Ollama

By default, Open WebUI will attempt to connect to a local Ollama server at http://127.0.0.1:11434. If it does not auto-detect, open the Web UI in your Windows browser at http://localhost:3000, go to Settings, then Connections, and set the base URL to http://127.0.0.1:11434. Save the setting and try a prompt using the model you pulled earlier.

8) Make services persistent

If systemd is enabled, Ollama already runs as a service. To run Open WebUI as a user service, create a file: mkdir -p ~/.config/systemd/user and then nano ~/.config/systemd/user/open-webui.service. Paste the following lines:

[Unit] Description=Open WebUI After=network.target

[Service] ExecStart=/usr/bin/env open-webui serve --host 0.0.0.0 --port 3000 Restart=always

[Install] WantedBy=default.target

Save the file, then run: systemctl --user daemon-reload && systemctl --user enable --now open-webui. Ensure lingering is on so it starts after reboot: loginctl enable-linger $USER. Now both services will start automatically whenever WSL launches.

9) Performance and resource tips

If you have a large GPU (12 GB+), try bigger models like llama3.1:70b but expect high VRAM use. For mid-range GPUs, 7B–8B models are ideal. You can limit VRAM use by setting OLLAMA_NUM_GPU=1 for single-GPU systems. Keep enough disk space under your WSL distro; models can be multiple gigabytes each. You can remove unused models with: ollama rm MODEL_NAME.

10) Troubleshooting

If Open WebUI cannot connect, confirm Ollama is running with: curl http://127.0.0.1:11434/api/tags. If GPU is not used, update to the latest NVIDIA driver, reboot, and re-test. From the Windows side, run nvidia-smi while prompting to confirm activity. If you see CUDA errors, install minimal CUDA libraries in WSL (nvidia-cuda-toolkit) and try again. On AMD GPUs, GPU acceleration in WSL is limited; use CPU or run Ollama natively on Windows if supported for your hardware.

Security notes

By default, Open WebUI’s first account becomes admin. Set a strong password and avoid exposing port 3000 to the internet. If you later publish this service beyond localhost, place it behind a reverse proxy with TLS and enable authentication. Consider binding Open WebUI to 127.0.0.1 only and accessing it via Windows browser for local use.

Wrap-up

You have a private, GPU-accelerated AI stack on Windows 11 using WSL2, Ollama, and Open WebUI. This setup is fast, secure, and flexible, letting you try different models without cloud costs. Keep your drivers updated, manage models carefully to save disk, and enjoy local inference with a smooth web experience.

Comments