Why run a private AI chatbot?
If you like the convenience of ChatGPT-style assistants but need better privacy, lower latency on your local network, or predictable costs, a self-hosted setup is a strong option. With Ollama you can run modern large language models (LLMs) locally, and with Open WebUI you get a clean web interface for chatting, managing models, and organizing prompts. In this tutorial you will deploy both on an Ubuntu server using Docker, so the install is repeatable and easy to maintain.
What you will build
By the end, you will have:
1) Ollama running as a service (the model runtime)
2) Open WebUI running in Docker (the chat UI)
3) Persistent storage for models and chat data
4) Optional GPU support notes if your server has NVIDIA
Prerequisites
Use an Ubuntu 22.04/24.04 server (VM or bare metal). A modern CPU and at least 8 GB RAM is workable for smaller models; 16–32 GB is more comfortable. You also need a user with sudo rights, outbound internet access to pull images/models, and Docker installed. If you plan to expose the UI beyond your LAN, put it behind a reverse proxy with TLS.
Step 1: Install Docker and Docker Compose
First, install Docker from Ubuntu’s repository (simple and reliable for most homelab and SMB setups):
Commands:
sudo apt update
sudo apt install -y docker.io docker-compose-plugin
sudo systemctl enable --now docker
Add your user to the docker group so you can run Docker without sudo (log out/in after this):
Command:
sudo usermod -aG docker $USER
Step 2: Create folders for persistent data
Persistent volumes are important because LLM files can be large and you do not want to re-download models after every container update. Create a working directory:
Commands:
mkdir -p ~/ai-stack/ollama
mkdir -p ~/ai-stack/openwebui
cd ~/ai-stack
Step 3: Create a Docker Compose file
Create a file named docker-compose.yml in ~/ai-stack. This setup runs Ollama and Open WebUI on the same Docker network. Ollama will listen on port 11434 internally; Open WebUI will be published on port 3000.
docker-compose.yml:
Copy and paste:
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- ./ollama:/root/.ollama
ports:
- "11434:11434"
openwebui:
image: ghcr.io/open-webui/open-webui:main
container_name: openwebui
restart: unless-stopped
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- ./openwebui:/app/backend/data
ports:
- "3000:8080"
Step 4: Start the services
Bring the stack up in detached mode:
Command:
docker compose up -d
Verify containers are running:
Command:
docker ps
Step 5: Open the Web UI and pull a model
In a browser, open:
http://YOUR_SERVER_IP:3000
Open WebUI will ask you to create an admin account on first run. After login, you can download models through the interface, or you can pull models from the server side using Ollama.
To pull a popular small model (good for testing), run:
Command:
docker exec -it ollama ollama pull llama3.2
Once the model is downloaded, refresh Open WebUI and select the model for chat. If you want a lighter footprint, try smaller parameter models; if you need better answers, larger models require more RAM/VRAM.
Step 6: Basic troubleshooting (the common issues)
Open WebUI loads but shows no models: Confirm the environment variable points to Ollama. Run docker logs openwebui and make sure it can reach http://ollama:11434. Also verify Ollama is healthy with curl http://localhost:11434 on the host.
Model downloads are slow or fail: Check disk space (df -h) and DNS connectivity. LLM downloads can be multiple gigabytes, so a nearly full disk will cause strange errors.
High CPU and slow replies: This is normal on CPU-only servers with larger models. Use a smaller model, reduce concurrent users, or add GPU acceleration.
Optional: NVIDIA GPU acceleration notes
If you have an NVIDIA GPU, install the NVIDIA driver and the NVIDIA Container Toolkit so Docker containers can access the GPU. Then adjust the Ollama service to request GPU resources (exact configuration depends on your Docker and driver versions). GPU support can dramatically improve response time and allow you to run larger models smoothly.
Step 7: Keep it secure and maintainable
For a safer deployment, do not expose port 3000 directly to the internet. Put Open WebUI behind Nginx or Caddy with HTTPS and authentication. For updates, pull new images and recreate containers:
Commands:
cd ~/ai-stack
docker compose pull
docker compose up -d
Because you used persistent volumes, your downloaded models and chat database stay intact across updates.
Wrap-up
Running Ollama with Open WebUI on Ubuntu gives you a practical private AI chatbot you can use for internal documentation, code explanations, drafting emails, and brainstorming without sending prompts to a third-party cloud service. Start with a smaller model to confirm everything works, then scale up based on your hardware and the quality you need.
3.
Comments
Post a Comment