Running a private AI assistant locally is becoming a practical option for developers and IT teams who want faster responses, lower cloud costs, and better control over sensitive code. In this tutorial, you will set up a self-hosted AI “code helper” on a Linux server using Ollama (for running large language models locally) and Open WebUI (a clean web interface). The result is a browser-based assistant you can use for code reviews, script generation, troubleshooting, and documentation drafts—without sending prompts to external services.
What You’ll Build
You will deploy two components: Ollama, which downloads and serves models via a local API, and Open WebUI, which connects to Ollama and provides a chat UI with conversation history. This guide uses Docker to keep the installation clean and easy to update.
Prerequisites
Before you start, prepare a Linux machine (Ubuntu 22.04/24.04, Debian 12, or similar) with at least 8 GB RAM (16 GB is better for larger models) and 20+ GB free disk. A GPU is optional, but a modern CPU works fine for smaller models. You also need Docker and Docker Compose (or the Docker Compose plugin).
Step 1: Install Docker (Ubuntu/Debian)
If Docker is not installed, run the commands below. On other distributions, use the official Docker documentation for your package manager.
Commands:
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Optional but recommended: allow your user to run Docker without sudo.
sudo usermod -aG docker $USER
Log out and back in after changing group membership.
Step 2: Create a Project Directory
Create a dedicated folder for your deployment so configuration and volumes stay organized.
mkdir -p ~/private-ai && cd ~/private-ai
Step 3: Create a Docker Compose File
Create a file named docker-compose.yml with the content below. It starts Ollama and Open WebUI, stores model data on disk, and makes the web UI available on port 3000.
cat > docker-compose.yml <<'EOF'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- openwebui:/app/backend/data
depends_on:
- ollama
volumes:
ollama:
openwebui:
EOF
Step 4: Start the Services
Bring the stack up in the background and confirm both containers are healthy.
docker compose up -d
docker ps
Now open your browser and go to http://YOUR_SERVER_IP:3000. The first user you create in Open WebUI typically becomes the admin, depending on the version.
Step 5: Download a Model with Ollama
Ollama pulls models on demand. For a lightweight code-focused start, try a smaller model first. Run the command below to download and test a model from inside the Ollama container.
docker exec -it ollama ollama pull codellama:7b
docker exec -it ollama ollama run codellama:7b
If you prefer a general assistant model, you can also try:
docker exec -it ollama ollama pull llama3.1:8b
Once pulled, go back to Open WebUI, start a new chat, and select the model. Your prompts will be processed locally on your server.
Step 6: Basic Hardening and Access Tips
If this server is not strictly internal, place Open WebUI behind a reverse proxy such as Nginx or Caddy and enable HTTPS. At a minimum, restrict access with a firewall so only your office IP/VPN can reach port 3000. On Ubuntu with UFW, you can allow only your admin workstation and block the rest.
sudo ufw allow from YOUR_IP to any port 3000 proto tcp
sudo ufw enable
Troubleshooting Common Problems
Open WebUI can’t see Ollama models: confirm the environment variable OLLAMA_BASE_URL points to http://ollama:11434 (container-to-container), and verify Ollama is listening: docker logs ollama.
Slow responses: smaller models respond faster on CPU. Also check system load and RAM usage. If the machine is swapping heavily, upgrade RAM or choose a smaller model.
Disk usage grows quickly: model files are large. Keep an eye on volumes and remove unused models with docker exec -it ollama ollama list and docker exec -it ollama ollama rm MODELNAME.
Conclusion
With Ollama and Open WebUI, you can run a capable private AI code assistant on your own Linux server in under an hour. This setup is ideal for testing prompts safely, speeding up daily scripting tasks, and keeping sensitive code and logs under your control. Once it’s running, you can experiment with different models, tighten access via HTTPS and VPN, and even dedicate a GPU host later for faster generation.
Comments
Post a Comment