How to Run Local AI with Ollama and Open WebUI on Ubuntu (GPU-Ready Guide)

Local large language models (LLMs) are now practical for developers, researchers, and privacy-focused teams. In this step-by-step guide, you will install and run Ollama (LLM runtime) and Open WebUI (a modern chat interface) on Ubuntu 22.04/24.04, with optional NVIDIA GPU acceleration. The setup uses Docker for easy updates, isolation, and backups.

Why this stack?

Ollama makes downloading and running models simple, offering a fast API on your machine. Open WebUI provides a sleek, extensible web app for chatting with multiple models, managing prompts, and moderating access. Together, they create a private, cost-effective alternative to cloud AI services.

Prerequisites

You need an Ubuntu 22.04/24.04 system with internet access and a user with sudo rights. If you have an NVIDIA GPU (recommended), you can enable GPU acceleration for much faster inference. CPU-only works too—skip the GPU steps if you do not have a supported GPU.

Step 1: Update the system

Update packages to ensure you have the latest dependencies and security fixes.

sudo apt update && sudo apt -y upgrade
sudo reboot

Step 2 (Optional): Enable NVIDIA GPU support

Install the latest proprietary NVIDIA driver and the container toolkit so Docker can use your GPU. Reboot when asked.

# Install recommended NVIDIA driver
sudo ubuntu-drivers install
sudo reboot

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list > /dev/null

sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU visibility
nvidia-smi

Step 3: Install Docker Engine and Docker Compose plugin

Install Docker from the official repository to get the latest stable version. Add your user to the docker group to run Docker without sudo.

sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

sudo usermod -aG docker $USER
newgrp docker

# Quick test
docker run --rm hello-world

Step 4: Create a Docker Compose file for Ollama + Open WebUI

Create a working directory and define services. The configuration below enables GPU when present; for CPU-only, remove the gpus: all line under the Ollama service.

mkdir -p ~/local-llm && cd ~/local-llm
nano docker-compose.yml

services:
  ollama:
    container_name: ollama
    image: ollama/ollama:latest
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    # Comment the next line if you are CPU-only
    gpus: all
    environment:
      - OLLAMA_KEEP_ALIVE=24h

  open-webui:
    container_name: open-webui
    image: ghcr.io/open-webui/open-webui:latest
    depends_on:
      - ollama
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=change_this_long_random_secret
    volumes:
      - openwebui:/app/backend/data

volumes:
  ollama:
  openwebui:

Step 5: Start the stack

Bring the services up in the background, then check their status. The Open WebUI will be available at http://SERVER_IP:3000 and the Ollama API at http://SERVER_IP:11434.

docker compose up -d
docker compose ps

Step 6: Pull and run a model

Use Ollama to pull an LLM. You can pick models like llama3.2, mistral, or a coding model. The first pull downloads model weights, which can be several GB.

# Pull a general-purpose model
docker exec -it ollama ollama pull llama3.2

# Test it via CLI
docker exec -it ollama ollama run llama3.2 "Write a two-sentence summary of Ubuntu."

# Or use the API
curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello!"}'

Open your browser to http://SERVER_IP:3000, select the model from the dropdown, and start chatting. In Settings, you can change default models, system prompts, and appearance.

Step 7: Secure access

If exposing the WebUI beyond your LAN, add authentication. In Open WebUI, create an admin user at first login, then disable new signups in Settings. For internet exposure, place a reverse proxy (Nginx, Caddy, or Traefik) with HTTPS (Let’s Encrypt) in front of port 3000.

Step 8: Update and backup

To update, pull the latest images and recreate containers without losing data stored in volumes. To back up, save the volumes before upgrades.

# Update images
docker compose pull
docker compose up -d

# Backup volumes (example)
docker run --rm -v local-llm_ollama:/data -v "$PWD":/backup \
  busybox tar czf /backup/ollama-vol.tgz /data

docker run --rm -v local-llm_openwebui:/data -v "$PWD":/backup \
  busybox tar czf /backup/openwebui-vol.tgz /data

Troubleshooting

GPU not used: Run docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi. If it fails, recheck the driver and NVIDIA Container Toolkit steps. Ensure the gpus: all line is present and Docker was restarted.

Permission denied with Docker: You may need to log out and back in after adding your user to the docker group, or run newgrp docker.

Port conflicts: Change the left side of port mappings in docker-compose.yml (e.g., use "8081:8080" for WebUI) and restart.

Slow or failed model pull: Verify disk space and retry. Large models require several GB of free space in the ollama volume.

Uninstall (optional)

To stop and remove everything, run:

cd ~/local-llm
docker compose down
docker volume rm local-llm_ollama local-llm_openwebui

Wrap-up

You have a modern local AI stack: Ollama for fast model serving and Open WebUI for a friendly, multi-model chat interface. With Docker, updates are quick and backups are simple. Add your favorite models, tune system prompts, and integrate the Ollama API into your apps—all without sending data to the cloud.

LifeBytes Journal

Search This Blog