How to Deploy a Private AI Assistant with Ollama and Open WebUI on Ubuntu Server (Docker)

Overview

If you want an AI assistant for internal documentation, troubleshooting, or drafting replies without sending company data to a third-party cloud, a self-hosted setup is a strong option. In this tutorial, you will deploy a private AI stack on an Ubuntu Server using Docker: Ollama (to run large language models locally) and Open WebUI (a clean web interface for chatting, prompts, and basic management). This approach is practical for homelabs and small teams, and it keeps your prompts and conversation history inside your own network.

What You Will Build

By the end, you will have two containers running: one for Ollama (the model runtime/API) and one for Open WebUI (the front-end). You will also configure persistent storage, pull a model, and confirm everything works from a browser. The steps below are written for Ubuntu Server 22.04/24.04, but will work on most modern Ubuntu releases.

Prerequisites

You need an Ubuntu Server with at least 8 GB RAM (16 GB recommended), 20+ GB free disk, and a modern CPU. A GPU helps performance but is not required for a functional deployment. You also need root or sudo access and a working network connection. If this is a server on a LAN, decide which port you will expose for the web interface (we will use 3000).

Step 1: Install Docker and Docker Compose

First, install Docker using the official repository packages. This ensures you get up-to-date components and fewer compatibility issues with Compose.

Run:

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

Optionally allow your user to run Docker without sudo (log out and back in afterward):

sudo usermod -aG docker $USER

Step 2: Create a Project Folder and Compose File

Create a directory to keep your deployment clean and manageable. Then create a docker-compose.yml file that defines both services and persistent volumes.

mkdir -p ~/ai-stack
cd ~/ai-stack
nano docker-compose.yml

Paste the following Compose configuration:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - openwebui:/app/backend/data
    depends_on:
      - ollama

volumes:
  ollama:
  openwebui:

Step 3: Start the Stack

Bring up the containers in the background and confirm they are running.

docker compose up -d
docker compose ps

If you see both services with a “running” state, the base deployment is complete.

Step 4: Pull a Model with Ollama

Now download a model. The best choice depends on your RAM and use case. For many servers, a smaller model is a safe starting point. The command below pulls a popular lightweight model.

docker exec -it ollama ollama pull llama3.2

You can list installed models anytime:

docker exec -it ollama ollama list

Step 5: Log In to Open WebUI and Connect to Ollama

Open a browser and go to http://SERVER-IP:3000. On first launch, Open WebUI asks you to create an admin account. After login, the interface should automatically detect Ollama through the internal Docker network using the OLLAMA_BASE_URL you configured.

Start a new chat, select the model you pulled (for example llama3.2), and send a test prompt such as “Write a short troubleshooting checklist for DNS issues.” If the response appears, your private AI assistant is working end-to-end.

Step 6: Basic Hardening and Practical Tips

Firewall: If this is a public-facing server, do not expose it directly without protection. At minimum, allow only your LAN or VPN subnet to reach port 3000. With UFW, you can restrict access instead of opening the port to everyone.

Reverse proxy: For production use, place Open WebUI behind Nginx or Caddy with HTTPS and authentication. This also makes it easier to use a friendly hostname.

Backups: Your important data lives in Docker volumes. Back up the Open WebUI volume (chat history, settings) and the Ollama volume (models) according to your retention needs.

Updates: Refresh images regularly to get security fixes and new features:

docker compose pull
docker compose up -d

Troubleshooting

Open WebUI loads but no models appear: Verify Ollama is reachable from the Open WebUI container. Check logs with docker logs open-webui and confirm OLLAMA_BASE_URL=http://ollama:11434 is correct.

Model downloads are slow: Large model pulls can take time. Ensure your server has stable internet and enough free disk. You can also choose smaller models to start.

High RAM usage or slow responses: Use a smaller model, reduce concurrent users, or run the service on hardware with more memory. Local AI is resource-intensive by design, and tuning is part of a realistic deployment.

Conclusion

Running Ollama and Open WebUI on Ubuntu Server gives you a private, self-hosted AI assistant that you can control, secure, and integrate into your workflow. Once the base stack is stable, you can expand it with HTTPS, SSO, logging, and routine backups. The key advantage is simple: your prompts and internal context stay on your infrastructure while still giving your team an easy web-based AI experience.

Comments