Deploy a Private RAG Chatbot with Ollama and Open WebUI (No Cloud Required)

Why a private RAG chatbot?

If your team needs an internal chatbot that can answer questions from company documents, you’ve probably looked at cloud AI services. The problem is compliance: sending sensitive data outside your network can be a deal-breaker. A practical alternative is a private RAG setup (Retrieval-Augmented Generation), where a local language model generates answers while a local index retrieves relevant text from your own files. In this tutorial, you’ll build a private RAG chatbot on a Linux server using Ollama (local LLM runtime) and Open WebUI (a friendly chat interface), then connect your documents to it.

What you will build

You will deploy two services with Docker: Ollama to run a model locally, and Open WebUI to provide a web-based chat UI and document ingestion features. This approach is ideal for homelabs, IT departments, and helpdesk teams who want AI-assisted answers without exposing internal knowledge to third parties.

Prerequisites

You need a Linux server or VM (Ubuntu/Debian recommended) with at least 8 GB RAM for smaller models; 16 GB+ is better. Disk space depends on the model (expect several GB). You also need Docker and Docker Compose. If you have an NVIDIA GPU, you can accelerate inference, but this guide works on CPU as well.

Step 1: Install Docker and Docker Compose

On Ubuntu, install Docker with the official packages, then enable the service:

Commands:
sudo apt update
sudo apt install -y docker.io docker-compose-plugin
sudo systemctl enable --now docker
sudo usermod -aG docker $USER

Log out and back in so your user can run Docker without sudo. Verify with:

Command:
docker version

Step 2: Create a Docker Compose file

Create a working directory and a compose file. This setup stores model files and WebUI data in persistent volumes so upgrades won’t wipe your configuration.

Commands:
mkdir -p ~/private-rag
cd ~/private-rag
nano docker-compose.yml

Paste the following content:

docker-compose.yml
version: "3.9"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - openwebui:/app/backend/data
    depends_on:
      - ollama

volumes:
  ollama:
  openwebui:

Step 3: Start the services

Bring the stack online:

Command:
docker compose up -d

Check that both containers are healthy:

Command:
docker ps

Open WebUI in your browser at http://YOUR_SERVER_IP:3000. The first account you create becomes the admin by default, so choose a strong password.

Step 4: Pull a model with Ollama

You can pull models directly inside the Ollama container. A good starting point for many servers is a smaller, fast instruct model.

Command:
docker exec -it ollama ollama pull llama3.1:8b

If your server has less RAM, try a smaller model. If you have more resources, you can experiment with larger variants for better reasoning. After pulling, confirm it’s available:

Command:
docker exec -it ollama ollama list

Step 5: Connect Open WebUI to the local model

In Open WebUI, go to the model selection menu and choose the model you pulled (for example, llama3.1:8b). Start a basic chat to confirm responses are generated locally. If it errors, verify that Open WebUI can reach Ollama on the internal Docker network and that the environment variable OLLAMA_BASE_URL matches the compose file.

Step 6: Enable RAG by adding your documents

To turn a general chatbot into a “knows our docs” assistant, ingest your content. In Open WebUI, find the section for Documents or Knowledge (wording may vary by version). Upload text-heavy sources such as internal runbooks, SOPs, FAQs, or exported wiki pages. For best retrieval results, prefer clean text formats like TXT, MD, PDF (machine-readable), and avoid scans without OCR.

After upload, Open WebUI will index the content so it can retrieve relevant chunks during chat. Test with a question that can only be answered from your document set, such as “What is our VPN reset procedure?” The response should cite or clearly reflect your internal wording. If the answer seems generic, add more targeted documents or refine your question.

Step 7: Secure access (quick hardening)

A private AI system can still leak data if it’s publicly exposed. First, bind access to trusted networks using a firewall (UFW on Ubuntu is simple) and consider putting Open WebUI behind a reverse proxy with HTTPS. Also, keep the service updated:

Commands:
cd ~/private-rag
docker compose pull
docker compose up -d

Finally, treat uploaded documents as sensitive: only allow authenticated users, and review what content is ingested. A RAG chatbot is powerful precisely because it can surface internal text quickly.

Troubleshooting tips

Model is slow: Use a smaller model, add RAM, or use GPU acceleration. Also reduce concurrent users.
WebUI can’t see the model: Confirm Ollama is running and reachable on port 11434 inside Docker, and that the model is listed in ollama list.
RAG answers are inaccurate: Upload more relevant documents, remove outdated versions, and prefer clean text sources. Retrieval quality depends heavily on document quality.

Next steps

Once your private RAG chatbot works, you can expand it by creating separate knowledge collections for different departments, adding a reverse proxy for SSO-like access control, or running multiple models for different tasks (fast model for chat, larger model for complex reasoning). This setup gives you a modern AI assistant while keeping your data inside your own environment.

Comments