Run a Local AI Chatbot with Ollama and Open WebUI on Ubuntu (GPU + Docker)

Local large language models are now practical on a single server. In this step-by-step guide, you will deploy a private AI chatbot by running Ollama (for models) and Open WebUI (for the user interface) on Ubuntu using Docker. We will enable GPU acceleration with NVIDIA so responses are fast and efficient. By the end, you will have a persistent setup that survives reboots and is easy to update.

Overview

Ollama is a lightweight runtime that downloads and serves popular open-source models like Llama 3. Open WebUI is a web app that connects to Ollama and provides a clean chat interface, prompt templates, conversation history, and model management. We will run both components in Docker containers on the same Docker network and map persistent volumes for data. Optional GPU acceleration uses the NVIDIA Container Toolkit.

Prerequisites

- Ubuntu Server or Desktop (22.04 or 24.04 recommended)
- An NVIDIA GPU with proprietary drivers installed (verify with nvidia-smi) if you want GPU acceleration; CPU-only also works
- Sudo access and outbound internet connectivity
- Basic command line familiarity

1) Install Docker Engine and Compose

Install the official Docker packages and add your user to the docker group for passwordless usage.

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

2) Enable NVIDIA GPUs in Docker (optional but recommended)

If you have an NVIDIA GPU and drivers are installed, add the NVIDIA Container Toolkit so Docker can access the GPU. Verify drivers first with nvidia-smi. Then install the toolkit and restart Docker.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Test compute visibility by running a CUDA-enabled container (optional):

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

3) Create a Docker network and persistent volumes

We will create an isolated network for both containers and define persistent volumes so model files and WebUI data survive restarts.

docker network create ollama-net
docker volume create ollama
docker volume create open-webui

4) Run Ollama (model server)

Start the Ollama container. If you have a GPU, include --gpus all. The port 11434 is the Ollama API.

# GPU-enabled
docker run -d --name ollama --gpus all --restart unless-stopped \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  --network ollama-net \
  ollama/ollama:latest

# CPU-only (if you do not have an NVIDIA GPU)
# docker run -d --name ollama --restart unless-stopped \
#   -p 11434:11434 \
#   -v ollama:/root/.ollama \
#   --network ollama-net \
#   ollama/ollama:latest

Pull a model and do a quick test inside the container. Llama 3 and Qwen are great starting options.

docker exec -it ollama ollama pull llama3.1:8b
docker exec -it ollama ollama run llama3.1:8b "Write a two-line poem about local AI."

5) Run Open WebUI (front-end)

Open WebUI connects to the Ollama API. On first launch it creates an admin account when you sign in. We will point it at Ollama via the internal Docker network name.

docker run -d --name open-webui --restart unless-stopped \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://ollama:11434 \
  -v open-webui:/app/backend/data \
  --network ollama-net \
  ghcr.io/open-webui/open-webui:latest

Open a browser to http://<server-ip>:3000. Create your account, select the model you pulled (for example, llama3.1:8b), and start chatting. You can pull additional models anytime using docker exec -it ollama ollama pull qwen2.5:7b and select them in Open WebUI.

6) Optional: Use Docker Compose instead of docker run

If you prefer a single file, create docker-compose.yml in an empty folder. The gpus: all key enables GPU acceleration when the NVIDIA toolkit is installed.

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    networks:
      - ollama-net
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: ["gpu"]
    # Alternatively for Compose v2+:
    # gpus: all

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - open-webui:/app/backend/data
    networks:
      - ollama-net

volumes:
  ollama:
  open-webui:

networks:
  ollama-net:
    external: true

Then run:

docker network create ollama-net
docker compose up -d

7) Updating and maintenance

To update images, pull the latest versions and recreate the containers. Your data and models remain in volumes.

docker pull ollama/ollama:latest
docker pull ghcr.io/open-webui/open-webui:latest
docker stop open-webui ollama && docker rm open-webui ollama
# re-run the docker run commands (or docker compose up -d)

To see logs for troubleshooting, run docker logs -f ollama and docker logs -f open-webui. If the WebUI cannot see models, ensure the environment variable OLLAMA_BASE_URL points to http://ollama:11434 and both containers share the same Docker network.

Troubleshooting tips

GPU not detected: Confirm the NVIDIA driver works on the host (nvidia-smi), the NVIDIA Container Toolkit is installed, and the container uses --gpus all. If using Compose, ensure gpus: all or the device reservation is defined.

Ports already in use: Change host ports in the run commands (for example, map Open WebUI to -p 8080:8080 instead of 3000).

Slow downloads or storage limits: Models are large. Consider attaching a larger Docker volume or moving /var/lib/docker to a disk with more space. You can also choose smaller models (7B) or quantized variants.

HTTPS and access control: Put Open WebUI behind a reverse proxy such as Nginx or Caddy with HTTPS and firewall rules. For internet exposure, add authentication, rate limits, and consider a VPN or zero-trust tunnel.

What you built

You now have a local, private AI chatbot with GPU acceleration on Ubuntu using Docker. Ollama handles model serving, while Open WebUI gives you a friendly interface with history, prompts, and multi-model management. This setup is repeatable, easy to update, and keeps your data on your own hardware.

Comments