How to Install Ollama and Open WebUI with GPU Acceleration on Ubuntu and Windows (2025 Guide)

Overview

This step-by-step guide shows how to run private, local large language models with Ollama and a modern chat interface using Open WebUI. We will cover installing Ollama on Ubuntu and Windows, enabling GPU acceleration, pulling popular models like Llama 3, and deploying Open WebUI with Docker so you can chat, run tools, and manage prompts from a browser. The result is a fast, secure, and offline-friendly AI stack that you control.

Prerequisites

You will need a 64-bit system, administrator privileges, and at least 16 GB of RAM for 7B–8B models. GPU acceleration is recommended for speed: keep your NVIDIA/AMD/Intel graphics drivers up to date. Ollama listens on port 11434 by default, and Open WebUI will run on port 3000. Ensure your firewall allows local access or your chosen LAN range.

Step 1 — Install Ollama

Ubuntu 22.04/24.04: Install Ollama with the official script, which adds the service and keeps it updated.

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
curl http://127.0.0.1:11434/api/version

You should see a version string from the last command. If not, check the service: sudo systemctl status ollama.

Windows 11/10: Install via the official MSI or Winget, then verify the local API.

winget install Ollama.Ollama
curl http://127.0.0.1:11434/api/version

On Windows, Ollama runs as a user service. If you use a third-party firewall, allow local traffic to port 11434.

Step 2 — Enable GPU Acceleration

GPU acceleration in Ollama is automatic when compatible drivers and runtimes are present. On Linux, install your vendor’s proprietary GPU driver. On Windows, use the latest Game Ready/Studio driver from the GPU vendor. After pulling a model and making a test prompt, watch the Ollama logs. If the run mentions the GPU and performance is high (tokens per second are significantly better than CPU), acceleration is working.

If you suspect CPU fallback, update drivers, make sure your GPU has enough VRAM for the chosen model size, and try a smaller variant (for example, 8B instead of 13B). On laptops with hybrid graphics, set the app/GPU preferences so Ollama can use the discrete GPU.

Step 3 — Pull a Model and Test Locally

Pull a model using the Ollama CLI. Popular, high-quality choices include Llama 3 and Mistral. The first run downloads and prepares weights; subsequent runs start instantly.

# Examples (pick one)
ollama pull llama3:8b
ollama pull llama3.1:8b
ollama pull mistral:7b

Now run a quick prompt:

ollama run llama3:8b
# At the prompt, type:
# What are three creative use cases for local AI at home?

If responses are slow or you see out-of-memory errors, switch to a smaller model or close GPU-intensive applications.

Step 4 — Deploy Open WebUI with Docker

Open WebUI adds a polished browser interface, prompt library, chat history, and extensions like RAG (retrieve and ground answers in your documents). We will connect it to your host’s Ollama instance. The following Docker Compose works on Linux and Windows. It uses host.docker.internal to reach the host-based Ollama API and maps persistent storage for Open WebUI data.

mkdir -p ~/openwebui && cd ~/openwebui
cat > docker-compose.yml <<'YAML'
services:
  openwebui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - open-webui-data:/app/backend/data
    restart: unless-stopped
volumes:
  open-webui-data:
YAML

docker compose up -d

Open your browser and visit http://localhost:3000. Create your account when prompted, pick your default model (for example, llama3:8b), and send a test message. If the UI cannot connect, ensure the Ollama service is running and that your firewall allows local connections to port 11434.

Optional — Run Both Ollama and Open WebUI in Docker

If you prefer everything containerized, you can run Ollama and Open WebUI in the same Compose file. This is convenient on servers. GPU pass-through in Docker requires recent drivers and, on Linux, the NVIDIA Container Toolkit. When in doubt, keep Ollama native and only containerize Open WebUI, as shown above.

Security, Updates, and Backups

Do not expose ports 11434 or 3000 directly to the internet. If you need remote access, place Open WebUI behind a reverse proxy (Nginx, Caddy, or Traefik) with HTTPS and strong authentication, or publish it through a zero-trust tunnel. Inside Open WebUI, enable authentication and limit registration to trusted users. Keep Docker images current by pulling the latest tags and recreating containers. On Ubuntu, the Ollama installer provides updates via its repository; on Windows, check for updates in the app or Winget. Back up ~/.ollama (models and configs) and your open-webui-data volume to preserve chat history and settings.

Troubleshooting

If Open WebUI says “Cannot connect to Ollama,” verify the API at http://127.0.0.1:11434/api/version and confirm your Compose file includes extra_hosts with host-gateway on Linux. On Windows with Docker Desktop, host.docker.internal works out of the box. If GPU acceleration is missing, update drivers, reboot, and try a smaller model. When Docker containers fail to start, check logs with docker logs open-webui and make sure ports 3000 and 11434 are not in use by other applications.

What You Can Do Next

With Ollama and Open WebUI running, you can add multiple models, create custom system prompts, and enable RAG by uploading PDFs or notes so the model answers with context from your documents. You can also script batch prompts via the Ollama HTTP API, integrate with automation tools, or point a browser extension to your local endpoint to replace cloud calls. The stack is private, fast, and easy to maintain—ideal for personal knowledge work or secure team deployments.

LifeBytes Journal

Search This Blog