Deploy Local LLMs on Ubuntu: Ollama + Open WebUI with Docker (GPU-Ready)

Overview

This step-by-step guide shows how to deploy a private, local AI stack on Ubuntu using Docker: Ollama for running large language models (LLMs) and Open WebUI as a fast, friendly chat interface. The setup works on CPUs and supports NVIDIA GPUs for acceleration. You will get a secure, self-hosted environment where you can run models like Llama 3.2, Phi-4, and Mistral without sending data to the cloud.

Prerequisites

- Ubuntu 22.04 or 24.04 (server or desktop)
- 16 GB RAM recommended (more for larger models), 30+ GB free disk
- Docker Engine and the Docker Compose plugin
- Optional: NVIDIA GPU with proprietary driver installed (e.g., 535+)

1) Install Docker and Docker Compose

Update your system and install Docker from the official repository for best stability and performance.

sudo apt update && sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER && newgrp docker

2) (Optional) Enable NVIDIA GPU in Containers

If you have an NVIDIA GPU, install the NVIDIA Container Toolkit so Docker can pass the GPU into containers:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify your driver with nvidia-smi. The container will get GPU access when you run it with --gpus all.

3) Create a Docker Network and Volumes

Create a dedicated network so services can talk by name and set up persistent storage:

docker network create llmnet
docker volume create ollama
docker volume create open-webui

4) Run the Ollama Container

Start Ollama. For CPU-only:

docker run -d --name ollama --restart=unless-stopped --network llmnet -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest

With NVIDIA GPU acceleration (detected automatically):

docker run -d --name ollama --restart=unless-stopped --network llmnet --gpus all -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest

5) Pull a Model

You can manage models from the host via docker exec. Pull a lightweight model to start quickly:

docker exec -it ollama ollama pull llama3.2:3b

Test generation from the command line:

curl http://localhost:11434/api/generate -d '{"model":"llama3.2:3b","prompt":"Say hello in one short line."}'

6) Launch Open WebUI

Open WebUI provides a clean chat interface and model manager. Start it on port 3000 and point it to the Ollama endpoint:

docker run -d --name open-webui --restart=unless-stopped --network llmnet -p 3000:8080 -e OLLAMA_BASE_URL=http://ollama:11434 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:latest

Open a browser and visit http://localhost:3000 (or your server IP). Create the first admin account, select a model (e.g., llama3.2:3b), and start chatting. If a model is missing, Open WebUI can pull it automatically via Ollama.

7) Optional: Use Docker Compose

Prefer to keep everything in a single file? Create docker-compose.yml in an empty folder and paste:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
     - "11434:11434"
    networks: [llmnet]
    volumes:
     - ollama:/root/.ollama
  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    ports:
     - "3000:8080"
    environment:
     - OLLAMA_BASE_URL=http://ollama:11434
    networks: [llmnet]
    volumes:
     - open-webui:/app/backend/data
networks:
  llmnet:
    external: true
volumes:
  ollama:
  open-webui:

Start with docker compose up -d. For GPU, prefer the docker run method with --gpus all, or adapt your Compose file using a GPU-capable configuration on your system.

8) Securing and Updating

- Restrict access: if running on a server, firewall ports 11434 and 3000 to trusted IPs.
- Reverse proxy: place Nginx or Caddy in front with HTTPS for remote access.
- Updates: pull newer images and recreate containers: docker pull ollama/ollama:latest && docker pull ghcr.io/open-webui/open-webui:latest, then docker stop and docker rm containers and re-run them. Your data persists in the volumes.

9) Troubleshooting

- Check logs: docker logs -f ollama and docker logs -f open-webui.
- Port in use: change published ports (e.g., -p 3001:8080).
- GPU not detected: validate nvidia-smi, reinstall the NVIDIA Container Toolkit, and ensure --gpus all is present.
- Disk space: models are large; prune unused data with docker system prune and remove models in ollama volume if needed.

10) Quick API and CLI Examples

- Pull another model: docker exec -it ollama ollama pull phi4:latest
- Chat from CLI: docker exec -it ollama ollama run mistral:7b
- Simple REST call: curl http://localhost:11434/api/generate -d '{"model":"phi4:latest","prompt":"Give me two bullet points about container security."}'

You now have a modern, private AI stack using Docker, Ollama, and Open WebUI on Ubuntu. It is fast, flexible, and ready for local development, internal knowledge assistants, and offline experimentation—no cloud required.

Comments