Running an AI assistant locally is no longer just a hobby project. With today’s efficient open models and lightweight runtimes, you can build a private “ChatGPT-style” environment on a Linux machine without sending prompts to any cloud service. This tutorial shows how to install Ollama (for downloading and serving models) and Open WebUI (a clean web interface) on Ubuntu/Debian-based systems. The result is a fast local chatbot you can use for troubleshooting, documentation drafts, code explanations, and more—while keeping your data on your own hardware.
What You Will Build
By the end of this guide, you will have: (1) Ollama installed and running as a service, (2) at least one model pulled and tested from the terminal, and (3) Open WebUI running in Docker and connected to Ollama. This setup works well on a modern CPU, and it can be accelerated if you have a compatible GPU, but GPU support is optional for getting started.
Prerequisites
You need a Linux server or workstation (Ubuntu 22.04/24.04 or Debian 12 is ideal), a user with sudo access, and at least 8 GB RAM for smaller models (16 GB+ is recommended for smoother performance). You also need enough disk space for model files; many popular models take several gigabytes each.
Step 1: Update the System
First, update packages and reboot if your kernel or core libraries are upgraded:
Commands:
sudo apt update && sudo apt -y upgradesudo reboot
Step 2: Install Ollama
Ollama provides a simple way to download and run large language models locally. It also exposes an HTTP API on your machine, which Open WebUI can talk to.
Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
After installation, confirm the service is running:
systemctl status ollama --no-pager
If it is not active, start and enable it:
sudo systemctl enable --now ollama
Step 3: Pull a Model and Test It
Next, download a model. For many users, a good starting point is a smaller model that runs comfortably on CPU. Choose one that matches your hardware and use case.
Example (pull and run a model):
ollama pull llama3.2ollama run llama3.2
Type a prompt such as: “Explain how DNS caching works in Linux.” If you get a sensible response, Ollama is working.
To see what models you have installed:
ollama list
Step 4: Install Docker (for Open WebUI)
Open WebUI is easiest to deploy in a container. If Docker is not installed, add it using the official Ubuntu/Debian packages:
sudo apt -y install docker.iosudo systemctl enable --now docker
Optional but convenient: allow your user to run Docker without sudo (log out and back in after this):
sudo usermod -aG docker $USER
Step 5: Run Open WebUI and Connect It to Ollama
Ollama listens locally (typically on port 11434). Open WebUI will run on port 3000. The key is to give the container access to the host’s Ollama API. The most reliable approach on Linux is to use host networking.
Run Open WebUI:
docker run -d --name open-webui --restart unless-stopped --network=host -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:main
Now open your browser and go to:
http://YOUR_SERVER_IP:3000
Create the admin account when prompted. After login, Open WebUI should detect Ollama. If it does not, check the Ollama URL in settings and confirm Ollama is running.
Step 6: Basic Security and Network Tips
If this system is reachable over a network, treat it like any internal web service. At minimum, restrict access to port 3000 with a firewall or run it behind a reverse proxy with TLS. On Ubuntu, you can use UFW to allow only your LAN subnet or a specific IP.
Example (allow only a trusted subnet):
sudo ufw allow from 192.168.1.0/24 to any port 3000 proto tcpsudo ufw enable
If you are deploying for multiple users, consider placing Open WebUI behind Nginx with HTTPS and basic authentication or SSO, depending on your environment.
Troubleshooting Checklist
Open WebUI loads but shows no models: Make sure you successfully ran ollama pull and that the container can reach http://127.0.0.1:11434. Using --network=host usually fixes connectivity issues on Linux.
Slow responses: Try a smaller model, close other memory-heavy apps, or upgrade RAM. Local AI performance is heavily tied to available memory bandwidth and CPU speed when running without GPU.
Ollama service not running: Check logs with journalctl -u ollama -n 100 --no-pager and verify you have enough free disk space for the model cache.
Conclusion
With Ollama and Open WebUI, you can run a practical AI assistant entirely on your own Linux system. This approach is ideal for homelabs, IT teams, and privacy-focused users who want modern AI capabilities without exposing internal prompts or data to external providers. Once the basics are working, you can experiment with different models, create custom system prompts for your helpdesk workflow, and even integrate the Ollama API into scripts and internal tools.
Comments
Post a Comment