How to Deploy a Private AI Chatbot with Ollama and Open WebUI on Ubuntu (Docker)

Why run a private AI chatbot?

If you like the convenience of ChatGPT-style assistants but need better privacy, lower latency on your local network, or predictable costs, a self-hosted setup is a strong option. With Ollama you can run modern large language models (LLMs) locally, and with Open WebUI you get a clean web interface for chatting, managing models, and organizing prompts. In this tutorial you will deploy both on an Ubuntu server using Docker, so the install is repeatable and easy to maintain.

What you will build

By the end, you will have:

1) Ollama running as a service (the model runtime)
2) Open WebUI running in Docker (the chat UI)
3) Persistent storage for models and chat data
4) Optional GPU support notes if your server has NVIDIA

Prerequisites

Use an Ubuntu 22.04/24.04 server (VM or bare metal). A modern CPU and at least 8 GB RAM is workable for smaller models; 16–32 GB is more comfortable. You also need a user with sudo rights, outbound internet access to pull images/models, and Docker installed. If you plan to expose the UI beyond your LAN, put it behind a reverse proxy with TLS.

Step 1: Install Docker and Docker Compose

First, install Docker from Ubuntu’s repository (simple and reliable for most homelab and SMB setups):

Commands:
sudo apt update
sudo apt install -y docker.io docker-compose-plugin
sudo systemctl enable --now docker

Add your user to the docker group so you can run Docker without sudo (log out/in after this):

Command:
sudo usermod -aG docker $USER

Step 2: Create folders for persistent data

Persistent volumes are important because LLM files can be large and you do not want to re-download models after every container update. Create a working directory:

Commands:
mkdir -p ~/ai-stack/ollama
mkdir -p ~/ai-stack/openwebui
cd ~/ai-stack

Step 3: Create a Docker Compose file

Create a file named docker-compose.yml in ~/ai-stack. This setup runs Ollama and Open WebUI on the same Docker network. Ollama will listen on port 11434 internally; Open WebUI will be published on port 3000.

docker-compose.yml:

Copy and paste:
version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ./ollama:/root/.ollama
    ports:
      - "11434:11434"

  openwebui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: openwebui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - ./openwebui:/app/backend/data
    ports:
      - "3000:8080"

Step 4: Start the services

Bring the stack up in detached mode:

Command:
docker compose up -d

Verify containers are running:

Command:
docker ps

Step 5: Open the Web UI and pull a model

In a browser, open:

http://YOUR_SERVER_IP:3000

Open WebUI will ask you to create an admin account on first run. After login, you can download models through the interface, or you can pull models from the server side using Ollama.

To pull a popular small model (good for testing), run:

Command:
docker exec -it ollama ollama pull llama3.2

Once the model is downloaded, refresh Open WebUI and select the model for chat. If you want a lighter footprint, try smaller parameter models; if you need better answers, larger models require more RAM/VRAM.

Step 6: Basic troubleshooting (the common issues)

Open WebUI loads but shows no models: Confirm the environment variable points to Ollama. Run docker logs openwebui and make sure it can reach http://ollama:11434. Also verify Ollama is healthy with curl http://localhost:11434 on the host.

Model downloads are slow or fail: Check disk space (df -h) and DNS connectivity. LLM downloads can be multiple gigabytes, so a nearly full disk will cause strange errors.

High CPU and slow replies: This is normal on CPU-only servers with larger models. Use a smaller model, reduce concurrent users, or add GPU acceleration.

Optional: NVIDIA GPU acceleration notes

If you have an NVIDIA GPU, install the NVIDIA driver and the NVIDIA Container Toolkit so Docker containers can access the GPU. Then adjust the Ollama service to request GPU resources (exact configuration depends on your Docker and driver versions). GPU support can dramatically improve response time and allow you to run larger models smoothly.

Step 7: Keep it secure and maintainable

For a safer deployment, do not expose port 3000 directly to the internet. Put Open WebUI behind Nginx or Caddy with HTTPS and authentication. For updates, pull new images and recreate containers:

Commands:
cd ~/ai-stack
docker compose pull
docker compose up -d

Because you used persistent volumes, your downloaded models and chat database stay intact across updates.

Wrap-up

Running Ollama with Open WebUI on Ubuntu gives you a practical private AI chatbot you can use for internal documentation, code explanations, drafting emails, and brainstorming without sending prompts to a third-party cloud service. Start with a smaller model to confirm everything works, then scale up based on your hardware and the quality you need.

3.

Comments