Create a Private ChatGPT-Style AI Assistant on Linux with Ollama and Open WebUI (No Cloud Required)

Running an AI assistant locally is no longer a niche experiment. With modern open models and lightweight serving tools, you can build a private, ChatGPT-style interface on your own Linux machine—no API keys, no data leaving your network, and full control over updates. In this tutorial, you will install Ollama (for downloading and serving LLMs) and Open WebUI (a clean web interface) using Docker. The result is a fast, self-hosted AI chat you can use for drafting, troubleshooting, and internal knowledge work.

What You’ll Build

By the end, you will have: (1) Ollama running as a local model server, (2) Open WebUI running in a container, and (3) a browser-based chat UI available on your LAN or localhost. This setup works well on Ubuntu Server, Debian, and most modern Linux distributions.

Prerequisites

Hardware: At least 8 GB RAM is recommended for smaller models. For better results, use 16 GB or more. A GPU helps but is not required for CPU-only usage.

Software: A Linux system with sudo access, Docker installed, and basic command-line familiarity. If you don’t have Docker yet, install it via your distribution’s official Docker instructions.

Step 1: Install and Start Ollama

Ollama provides a simple way to download and run large language models locally. Install it with the official script:

Command:

curl -fsSL https://ollama.com/install.sh | sh

After installation, start and enable the service (on most systemd-based systems):

sudo systemctl enable --now ollama

Confirm it is active:

systemctl status ollama

Step 2: Download a Model

Now you’ll pull a model. Choose one that matches your hardware. For a balanced option on many systems, try Llama 3 (size availability depends on what Ollama offers at the moment). Pulling a model can take time because it downloads several GB.

ollama pull llama3

Test it directly in the terminal:

ollama run llama3

Type a short prompt like “Explain RAID 1 in simple terms” to confirm it responds. Exit the session when done.

Step 3: Run Open WebUI in Docker

Open WebUI provides a user-friendly interface that feels similar to popular AI chat apps. It can connect to Ollama running on the host. Start by creating a persistent volume for WebUI data:

docker volume create open-webui

Then run the container. The key setting is the environment variable that tells WebUI where to find Ollama:

docker run -d \
--name open-webui \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
--restart unless-stopped \
ghcr.io/open-webui/open-webui:main

On some Linux hosts, host.docker.internal may not resolve by default. If Open WebUI can’t connect to Ollama, rerun the container with an extra host mapping:

docker rm -f open-webui

docker run -d \
--name open-webui \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
--restart unless-stopped \
ghcr.io/open-webui/open-webui:main

Step 4: Open the Web Interface

In your browser, open:

http://localhost:3000

If you’re accessing it from another computer on the same network, replace localhost with your Linux server’s IP address. The first time you open Open WebUI, you will create an admin account. After login, look for the model selection and choose the model you pulled (for example, llama3).

Step 5: Basic Security Hardening

A local AI chat can contain sensitive data, so treat it like an internal app. If this is only for you, bind to localhost by using Docker’s loopback mapping:

-p 127.0.0.1:3000:8080

If you need LAN access, consider placing it behind a reverse proxy (Nginx or Caddy) with HTTPS and authentication. Also make sure your firewall only allows trusted networks to reach port 3000.

Troubleshooting Tips

WebUI shows no models: Confirm Ollama is running and reachable. Try curl http://127.0.0.1:11434 on the host, then check the container logs with docker logs open-webui.

Slow responses: Use a smaller model, close other memory-heavy services, or run on a machine with more RAM. CPU-only inference is usable, but performance varies widely by hardware.

Connection errors from container to host: Use the --add-host=host.docker.internal:host-gateway option shown earlier, and keep the OLLAMA_BASE_URL pointing to http://host.docker.internal:11434.

Next Steps

Once your private AI assistant is working, you can expand it with additional models for different tasks, create separate chats for projects, and experiment with system prompts for consistent tone and formatting. The big advantage of this setup is control: you decide what runs, what gets stored, and how it’s exposed—without relying on external services.

Comments