Deploy Local LLMs on Ubuntu: Ollama + Open WebUI with Docker (GPU-Ready)

Overview

This step-by-step guide shows how to deploy a private, local AI stack on Ubuntu using Docker: Ollama for running large language models (LLMs) and Open WebUI as a fast, friendly chat interface. The setup works on CPUs and supports NVIDIA GPUs for acceleration. You will get a secure, self-hosted environment where you can run models like Llama 3.2, Phi-4, and Mistral without sending data to the cloud.

Prerequisites

- Ubuntu 22.04 or 24.04 (server or desktop)
- 16 GB RAM recommended (more for larger models), 30+ GB free disk
- Docker Engine and the Docker Compose plugin
- Optional: NVIDIA GPU with proprietary driver installed (e.g., 535+)

1) Install Docker and Docker Compose

Update your system and install Docker from the official repository for best stability and performance.

sudo apt update && sudo apt install -y ca-certificates curl gnupg sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null sudo apt update sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin sudo usermod -aG docker $USER && newgrp docker

2) (Optional) Enable NVIDIA GPU in Containers

If you have an NVIDIA GPU, install the NVIDIA Container Toolkit so Docker can pass the GPU into containers:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt update && sudo apt install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker

Verify your driver with nvidia-smi. The container will get GPU access when you run it with --gpus all.

3) Create a Docker Network and Volumes

Create a dedicated network so services can talk by name and set up persistent storage:

docker network create llmnet docker volume create ollama docker volume create open-webui

4) Run the Ollama Container

Start Ollama. For CPU-only:

docker run -d --name ollama --restart=unless-stopped --network llmnet -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest

With NVIDIA GPU acceleration (detected automatically):

docker run -d --name ollama --restart=unless-stopped --network llmnet --gpus all -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest

5) Pull a Model

You can manage models from the host via docker exec. Pull a lightweight model to start quickly:

docker exec -it ollama ollama pull llama3.2:3b

Test generation from the command line:

curl http://localhost:11434/api/generate -d '{"model":"llama3.2:3b","prompt":"Say hello in one short line."}'

6) Launch Open WebUI

Open WebUI provides a clean chat interface and model manager. Start it on port 3000 and point it to the Ollama endpoint:

docker run -d --name open-webui --restart=unless-stopped --network llmnet -p 3000:8080 -e OLLAMA_BASE_URL=http://ollama:11434 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:latest

Open a browser and visit http://localhost:3000 (or your server IP). Create the first admin account, select a model (e.g., llama3.2:3b), and start chatting. If a model is missing, Open WebUI can pull it automatically via Ollama.

7) Optional: Use Docker Compose

Prefer to keep everything in a single file? Create docker-compose.yml in an empty folder and paste:

services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped ports: - "11434:11434" networks: [llmnet] volumes: - ollama:/root/.ollama open-webui: image: ghcr.io/open-webui/open-webui:latest container_name: open-webui restart: unless-stopped ports: - "3000:8080" environment: - OLLAMA_BASE_URL=http://ollama:11434 networks: [llmnet] volumes: - open-webui:/app/backend/data networks: llmnet: external: true volumes: ollama: open-webui:

Start with docker compose up -d. For GPU, prefer the docker run method with --gpus all, or adapt your Compose file using a GPU-capable configuration on your system.

8) Securing and Updating

- Restrict access: if running on a server, firewall ports 11434 and 3000 to trusted IPs.
- Reverse proxy: place Nginx or Caddy in front with HTTPS for remote access.
- Updates: pull newer images and recreate containers: docker pull ollama/ollama:latest && docker pull ghcr.io/open-webui/open-webui:latest, then docker stop and docker rm containers and re-run them. Your data persists in the volumes.

9) Troubleshooting

- Check logs: docker logs -f ollama and docker logs -f open-webui.
- Port in use: change published ports (e.g., -p 3001:8080).
- GPU not detected: validate nvidia-smi, reinstall the NVIDIA Container Toolkit, and ensure --gpus all is present.
- Disk space: models are large; prune unused data with docker system prune and remove models in ollama volume if needed.

10) Quick API and CLI Examples

- Pull another model: docker exec -it ollama ollama pull phi4:latest
- Chat from CLI: docker exec -it ollama ollama run mistral:7b
- Simple REST call: curl http://localhost:11434/api/generate -d '{"model":"phi4:latest","prompt":"Give me two bullet points about container security."}'

You now have a modern, private AI stack using Docker, Ollama, and Open WebUI on Ubuntu. It is fast, flexible, and ready for local development, internal knowledge assistants, and offline experimentation—no cloud required.

LifeBytes Journal

Search This Blog