How to Run a Private AI Assistant on Linux with Ollama and Open WebUI (No Cloud Required)

Why a local AI assistant is worth it

Cloud AI tools are convenient, but they also raise real concerns: sensitive prompts, internal documents, compliance rules, and recurring subscription costs. A local AI assistant can solve many of these issues by keeping your data on your own machine while still delivering fast, useful responses. In this tutorial, you will set up a private AI assistant on Linux using Ollama (for running local LLMs) and Open WebUI (a clean web interface that feels like a chat product). The result is a browser-based AI assistant available on your LAN, with no external API keys.

What you will build

By the end of this guide, you will have: (1) Ollama installed and running as a service, (2) at least one model downloaded and tested from the command line, and (3) Open WebUI running in Docker and connected to Ollama. This setup works well for homelabs, helpdesk teams, and developers who want a private “ChatGPT-style” assistant for drafting emails, explaining logs, generating scripts, and summarizing text.

Prerequisites

You need a modern Linux machine (Ubuntu 22.04/24.04, Debian 12, Fedora, etc.), at least 8 GB RAM (16 GB recommended), and 20+ GB free disk space depending on the model you download. A GPU is optional; many models run on CPU, just slower. You also need curl and Docker (or Docker Engine) for the WebUI portion.

Step 1: Install Ollama

Ollama provides a simple way to download and run LLMs locally. Install it with the official script:

Command: curl -fsSL https://ollama.com/install.sh | sh

After installation, confirm the service is running:

Command: systemctl status ollama

If your distro does not use systemd, you can still run Ollama manually, but most server installs will use systemd by default.

Step 2: Download a model and test it

Next, pull a model. For general-purpose tasks, many people start with a small, fast instruction model. Try one of these based on your hardware: llama3.2 (smaller), qwen2.5 (strong reasoning), or another model you prefer. Example:

Command: ollama pull llama3.2

Now run a quick test in your terminal:

Command: ollama run llama3.2

Type a prompt like: Explain what DNS does in simple terms. If you get a response, Ollama is working correctly.

Step 3: Install Docker (if needed)

Open WebUI is easiest to deploy with Docker. On Ubuntu, you can install Docker Engine from the official repository, but for many lab environments the packaged version is sufficient. If Docker is already installed, verify it:

Command: docker --version

Also ensure your user can run Docker without sudo (optional but convenient). If you add yourself to the docker group, log out and back in for it to take effect.

Step 4: Run Open WebUI and connect it to Ollama

Ollama listens locally on port 11434 by default. Open WebUI will connect to it via an environment variable. Run Open WebUI like this:

Command:

docker run -d --name open-webui --restart unless-stopped -p 3000:8080 -e OLLAMA_BASE_URL=http://host.docker.internal:11434 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:main

On many Linux systems, host.docker.internal may not resolve by default. If the WebUI cannot reach Ollama, use Docker’s host networking (simple for trusted LAN setups):

Alternative command:

docker run -d --name open-webui --restart unless-stopped --network=host -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:main

After the container starts, open your browser and visit http://localhost:3000 (or the server IP with the same port). Create an admin account when prompted. Open WebUI should automatically detect the models you pulled with Ollama.

Step 5: Make it accessible on your LAN (optional)

If you want other machines to access the WebUI, ensure your firewall allows inbound TCP on port 3000. On Ubuntu with UFW, for example, allow it explicitly. Keep the service private to your LAN and avoid exposing it to the internet unless you add proper authentication, TLS, and a reverse proxy.

Troubleshooting tips

WebUI shows “No models”: Make sure you pulled a model with ollama pull and that Open WebUI can reach Ollama at http://127.0.0.1:11434. If you used bridge networking, test name resolution and connectivity from inside the container.

Slow responses: CPU-only inference can be slow on larger models. Try a smaller model, close other heavy applications, or consider a machine with more RAM and a supported GPU.

Disk fills up quickly: Models are large. Remove unused ones with ollama rm <model> and keep an eye on your Docker volume usage as well.

Next steps: making it production-friendly

Once the basics work, you can harden the deployment: put Open WebUI behind Nginx with HTTPS, restrict access by IP, and keep your models curated for your team’s use cases (helpdesk knowledge, scripting assistance, log explanation). The biggest win is control: your prompts stay local, your costs are predictable, and your assistant is always available—even when the internet is not.

Comments