Why a private RAG chatbot?
If your team needs an internal chatbot that can answer questions from company documents, you’ve probably looked at cloud AI services. The problem is compliance: sending sensitive data outside your network can be a deal-breaker. A practical alternative is a private RAG setup (Retrieval-Augmented Generation), where a local language model generates answers while a local index retrieves relevant text from your own files. In this tutorial, you’ll build a private RAG chatbot on a Linux server using Ollama (local LLM runtime) and Open WebUI (a friendly chat interface), then connect your documents to it.
What you will build
You will deploy two services with Docker: Ollama to run a model locally, and Open WebUI to provide a web-based chat UI and document ingestion features. This approach is ideal for homelabs, IT departments, and helpdesk teams who want AI-assisted answers without exposing internal knowledge to third parties.
Prerequisites
You need a Linux server or VM (Ubuntu/Debian recommended) with at least 8 GB RAM for smaller models; 16 GB+ is better. Disk space depends on the model (expect several GB). You also need Docker and Docker Compose. If you have an NVIDIA GPU, you can accelerate inference, but this guide works on CPU as well.
Step 1: Install Docker and Docker Compose
On Ubuntu, install Docker with the official packages, then enable the service:
Commands:
sudo apt update
sudo apt install -y docker.io docker-compose-plugin
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
Log out and back in so your user can run Docker without sudo. Verify with:
Command:
docker version
Step 2: Create a Docker Compose file
Create a working directory and a compose file. This setup stores model files and WebUI data in persistent volumes so upgrades won’t wipe your configuration.
Commands:
mkdir -p ~/private-rag
cd ~/private-rag
nano docker-compose.yml
Paste the following content:
docker-compose.yml
version: "3.9"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- openwebui:/app/backend/data
depends_on:
- ollama
volumes:
ollama:
openwebui:
Step 3: Start the services
Bring the stack online:
Command:
docker compose up -d
Check that both containers are healthy:
Command:
docker ps
Open WebUI in your browser at http://YOUR_SERVER_IP:3000. The first account you create becomes the admin by default, so choose a strong password.
Step 4: Pull a model with Ollama
You can pull models directly inside the Ollama container. A good starting point for many servers is a smaller, fast instruct model.
Command:
docker exec -it ollama ollama pull llama3.1:8b
If your server has less RAM, try a smaller model. If you have more resources, you can experiment with larger variants for better reasoning. After pulling, confirm it’s available:
Command:
docker exec -it ollama ollama list
Step 5: Connect Open WebUI to the local model
In Open WebUI, go to the model selection menu and choose the model you pulled (for example, llama3.1:8b). Start a basic chat to confirm responses are generated locally. If it errors, verify that Open WebUI can reach Ollama on the internal Docker network and that the environment variable OLLAMA_BASE_URL matches the compose file.
Step 6: Enable RAG by adding your documents
To turn a general chatbot into a “knows our docs” assistant, ingest your content. In Open WebUI, find the section for Documents or Knowledge (wording may vary by version). Upload text-heavy sources such as internal runbooks, SOPs, FAQs, or exported wiki pages. For best retrieval results, prefer clean text formats like TXT, MD, PDF (machine-readable), and avoid scans without OCR.
After upload, Open WebUI will index the content so it can retrieve relevant chunks during chat. Test with a question that can only be answered from your document set, such as “What is our VPN reset procedure?” The response should cite or clearly reflect your internal wording. If the answer seems generic, add more targeted documents or refine your question.
Step 7: Secure access (quick hardening)
A private AI system can still leak data if it’s publicly exposed. First, bind access to trusted networks using a firewall (UFW on Ubuntu is simple) and consider putting Open WebUI behind a reverse proxy with HTTPS. Also, keep the service updated:
Commands:
cd ~/private-rag
docker compose pull
docker compose up -d
Finally, treat uploaded documents as sensitive: only allow authenticated users, and review what content is ingested. A RAG chatbot is powerful precisely because it can surface internal text quickly.
Troubleshooting tips
Model is slow: Use a smaller model, add RAM, or use GPU acceleration. Also reduce concurrent users.
WebUI can’t see the model: Confirm Ollama is running and reachable on port 11434 inside Docker, and that the model is listed in ollama list.
RAG answers are inaccurate: Upload more relevant documents, remove outdated versions, and prefer clean text sources. Retrieval quality depends heavily on document quality.
Next steps
Once your private RAG chatbot works, you can expand it by creating separate knowledge collections for different departments, adding a reverse proxy for SSO-like access control, or running multiple models for different tasks (fast model for chat, larger model for complex reasoning). This setup gives you a modern AI assistant while keeping your data inside your own environment.
Comments
Post a Comment