Running an AI assistant locally is quickly becoming a practical option for developers and IT teams who want faster responses, offline access, and better control over sensitive code. In this tutorial, you will set up a private, local “ChatGPT-like” interface on a Linux server or workstation using Ollama (to run large language models) and Open WebUI (a web interface you can access from a browser). The result is a self-hosted AI assistant you can use for coding help, troubleshooting, and documentation drafts—without sending prompts to a third-party cloud.
This guide focuses on modern Linux distributions (Ubuntu/Debian-based commands are shown). The same approach works on many other distros with minor package differences. You’ll also learn basic hardening steps so the UI is not accidentally exposed to the internet.
Prerequisites
Hardware: A machine with at least 8 GB RAM is workable for smaller models, but 16 GB+ is recommended. A GPU is optional; many models run on CPU, just slower.
Software: Linux with sudo access, and either Docker (recommended) or Python for Open WebUI. You’ll also want an SSH session if you’re setting this up on a server.
Step 1: Install Ollama
Ollama is a lightweight runtime that downloads and runs models locally. Install it with the official script:
Command:
curl -fsSL https://ollama.com/install.sh | sh
After installation, verify it’s working:
ollama --version
Now pull a model. For a good balance of speed and capability, try a smaller modern model first:
ollama pull llama3.1
Test a quick prompt in the terminal:
ollama run llama3.1
Type a question, press Enter, and confirm you get a response. Exit with /bye or Ctrl+C depending on your session.
Step 2: Install Docker (Recommended)
Open WebUI can be installed in several ways, but Docker keeps it clean and easy to upgrade. Install Docker if you don’t already have it:
sudo apt update
sudo apt install -y docker.io
sudo systemctl enable --now docker
Optional but useful: allow your user to run Docker without sudo (log out and back in afterward):
sudo usermod -aG docker $USER
Step 3: Run Open WebUI and Connect It to Ollama
Open WebUI will provide a browser-based chat interface. Start it with Docker. The easiest approach is to map the container port to your host and point it at the Ollama API.
First, confirm Ollama is running. On many systems it runs as a service automatically after installation. You can check:
sudo systemctl status ollama
Now run Open WebUI:
docker run -d --name open-webui \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
--restart unless-stopped \
ghcr.io/open-webui/open-webui:main
If your Docker setup doesn’t support host.docker.internal on Linux, use the host network or the server IP instead. A common workaround is:
-e OLLAMA_BASE_URL=http://172.17.0.1:11434
Then open your browser to:
http://localhost:3000
Create the admin account on first run. After login, you should see available Ollama models. If you already pulled llama3.1, it should appear in the model list or be selectable.
Step 4: Make It Safe (Local Network Access Without Public Exposure)
By default, mapping -p 3000:8080 may expose the UI on all interfaces. If this is a server, you typically want LAN-only access. A simple method is to bind to a specific interface or localhost. For local-only access:
docker rm -f open-webui
docker run -d --name open-webui \
-p 127.0.0.1:3000:8080 \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
-v open-webui:/app/backend/data \
--restart unless-stopped \
ghcr.io/open-webui/open-webui:main
If you need access from another PC, consider using an SSH tunnel instead of opening a firewall port:
ssh -L 3000:127.0.0.1:3000 user@your-server
Then browse to http://localhost:3000 on your local machine.
Step 5: Common Troubleshooting
Model not showing up: Make sure the model is installed with ollama list. If it isn’t listed, run ollama pull <model>.
Open WebUI can’t connect to Ollama: Confirm Ollama listens on port 11434 and is reachable from the container. Check logs with docker logs open-webui. If needed, try the Docker bridge gateway IP (172.17.0.1) as the base URL.
Slow responses: Use a smaller model, close other memory-heavy apps, or run on a machine with more RAM. CPU-only inference is normal but slower.
Final Notes
With Ollama and Open WebUI, you get a practical local AI assistant that can help write scripts, explain logs, draft runbooks, and speed up troubleshooting—while keeping prompts on your own hardware. Once it’s running, experiment with different models and create reusable “system prompts” for tasks like helpdesk triage, Linux administration, or code review.
3.
Comments
Post a Comment