Run Local AI on Linux with Ollama and Open WebUI (No Cloud Required)

Running an AI assistant locally is no longer just a hobby project. With today’s efficient open models and lightweight runtimes, you can build a private “ChatGPT-style” environment on a Linux machine without sending prompts to any cloud service. This tutorial shows how to install Ollama (for downloading and serving models) and Open WebUI (a clean web interface) on Ubuntu/Debian-based systems. The result is a fast local chatbot you can use for troubleshooting, documentation drafts, code explanations, and more—while keeping your data on your own hardware.

What You Will Build

By the end of this guide, you will have: (1) Ollama installed and running as a service, (2) at least one model pulled and tested from the terminal, and (3) Open WebUI running in Docker and connected to Ollama. This setup works well on a modern CPU, and it can be accelerated if you have a compatible GPU, but GPU support is optional for getting started.

Prerequisites

You need a Linux server or workstation (Ubuntu 22.04/24.04 or Debian 12 is ideal), a user with sudo access, and at least 8 GB RAM for smaller models (16 GB+ is recommended for smoother performance). You also need enough disk space for model files; many popular models take several gigabytes each.

Step 1: Update the System

First, update packages and reboot if your kernel or core libraries are upgraded:

Commands:

sudo apt update && sudo apt -y upgrade
sudo reboot

Step 2: Install Ollama

Ollama provides a simple way to download and run large language models locally. It also exposes an HTTP API on your machine, which Open WebUI can talk to.

Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

After installation, confirm the service is running:

systemctl status ollama --no-pager

If it is not active, start and enable it:

sudo systemctl enable --now ollama

Step 3: Pull a Model and Test It

Next, download a model. For many users, a good starting point is a smaller model that runs comfortably on CPU. Choose one that matches your hardware and use case.

Example (pull and run a model):

ollama pull llama3.2
ollama run llama3.2

Type a prompt such as: “Explain how DNS caching works in Linux.” If you get a sensible response, Ollama is working.

To see what models you have installed:

ollama list

Step 4: Install Docker (for Open WebUI)

Open WebUI is easiest to deploy in a container. If Docker is not installed, add it using the official Ubuntu/Debian packages:

sudo apt -y install docker.io
sudo systemctl enable --now docker

Optional but convenient: allow your user to run Docker without sudo (log out and back in after this):

sudo usermod -aG docker $USER

Step 5: Run Open WebUI and Connect It to Ollama

Ollama listens locally (typically on port 11434). Open WebUI will run on port 3000. The key is to give the container access to the host’s Ollama API. The most reliable approach on Linux is to use host networking.

Run Open WebUI:

docker run -d --name open-webui --restart unless-stopped --network=host -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:main

Now open your browser and go to:

http://YOUR_SERVER_IP:3000

Create the admin account when prompted. After login, Open WebUI should detect Ollama. If it does not, check the Ollama URL in settings and confirm Ollama is running.

Step 6: Basic Security and Network Tips

If this system is reachable over a network, treat it like any internal web service. At minimum, restrict access to port 3000 with a firewall or run it behind a reverse proxy with TLS. On Ubuntu, you can use UFW to allow only your LAN subnet or a specific IP.

Example (allow only a trusted subnet):

sudo ufw allow from 192.168.1.0/24 to any port 3000 proto tcp
sudo ufw enable

If you are deploying for multiple users, consider placing Open WebUI behind Nginx with HTTPS and basic authentication or SSO, depending on your environment.

Troubleshooting Checklist

Open WebUI loads but shows no models: Make sure you successfully ran ollama pull and that the container can reach http://127.0.0.1:11434. Using --network=host usually fixes connectivity issues on Linux.

Slow responses: Try a smaller model, close other memory-heavy apps, or upgrade RAM. Local AI performance is heavily tied to available memory bandwidth and CPU speed when running without GPU.

Ollama service not running: Check logs with journalctl -u ollama -n 100 --no-pager and verify you have enough free disk space for the model cache.

Conclusion

With Ollama and Open WebUI, you can run a practical AI assistant entirely on your own Linux system. This approach is ideal for homelabs, IT teams, and privacy-focused users who want modern AI capabilities without exposing internal prompts or data to external providers. Once the basics are working, you can experiment with different models, create custom system prompts for your helpdesk workflow, and even integrate the Ollama API into scripts and internal tools.

Comments