How to Run a Local LLM with Ollama and Open WebUI on Linux (Private AI Chat in Minutes)

Running a large language model (LLM) locally is one of the fastest ways to get private, low-latency AI assistance without sending your prompts to a third-party cloud. In this tutorial, you will set up Ollama (a lightweight LLM runtime) and Open WebUI (a clean web interface) on Linux. The result is a self-hosted AI chat you can use for drafting, coding help, log analysis, and knowledge base searching—while keeping data on your own machine.

What You Need

Hardware: A modern CPU system works, but more RAM helps a lot. For small models (like 7B), aim for 8–16 GB RAM. For smoother performance or larger models, 32 GB+ is recommended. If you have an NVIDIA GPU, you can accelerate generation, but this guide focuses on a reliable CPU-first setup.

Software: A recent Linux distribution (Ubuntu/Debian/Fedora), terminal access, and either Docker (recommended for Open WebUI) or Python knowledge if you prefer manual setups.

Step 1: Install Ollama

Ollama makes local model management simple: you download a model once and then run it with a single command. To install Ollama, open a terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

After installation, verify it works:

ollama --version

On most systems, Ollama starts as a service automatically. If you need to start it manually, you can run:

ollama serve

Step 2: Pull a Model (Example: Llama 3.1)

Now download a model. A good starting point is a modern 7B or 8B model. Pull it with:

ollama pull llama3.1

Once it finishes, test a quick prompt directly in the terminal:

ollama run llama3.1

Type a message (for example, “Summarize the difference between TCP and UDP”) and press Enter. If you get a response, the local model runtime is working.

Step 3: Install Docker (for Open WebUI)

Open WebUI is easiest to run in a container. If Docker is not installed, on Ubuntu/Debian you can do:

sudo apt update
sudo apt install -y docker.io
sudo systemctl enable --now docker

Optional but recommended: allow your user to run Docker without sudo:

sudo usermod -aG docker $USER

Log out and back in for the group change to apply.

Step 4: Run Open WebUI and Connect It to Ollama

Start Open WebUI with Docker. This command creates persistent storage and publishes the web interface on port 3000:

docker run -d --name open-webui -p 3000:8080 -v open-webui:/app/backend/data --restart unless-stopped ghcr.io/open-webui/open-webui:main

Next, ensure Open WebUI can reach Ollama. If Open WebUI does not automatically detect it, the most common fix is to point it to the Ollama API endpoint. Ollama listens on http://localhost:11434 by default. Depending on your Docker networking setup, “localhost” inside the container is not the host machine.

A practical approach is to run Open WebUI using host networking (Linux only). Stop the existing container and re-run:

docker rm -f open-webui
docker run -d --name open-webui --network=host -v open-webui:/app/backend/data --restart unless-stopped ghcr.io/open-webui/open-webui:main

Now open your browser and go to:

http://localhost:3000

Create an admin account when prompted. In the Open WebUI settings, you should see Ollama as an available provider. Select the model you pulled (for example, llama3.1) and start chatting.

Step 5: Improve Performance and Reliability

Choose the right model size: If responses feel slow, try a smaller model. Ollama supports many options; you can keep multiple models and switch depending on the task. Smaller models are great for quick drafts, command explanations, and lightweight Q&A.

Keep your data private: Local LLMs are only “private” if you avoid sending data out through plugins or external integrations. Treat the WebUI like any internal tool: secure access, avoid exposing it to the public internet, and consider a reverse proxy with authentication if you need remote access.

Troubleshoot connectivity: If Open WebUI can’t see Ollama, confirm the Ollama service is running and listening on port 11434:

ss -tulpn | grep 11434

If you prefer not to use host networking, you can configure Ollama to bind to an address reachable from Docker and then point Open WebUI to that address. The exact method depends on your distro and firewall rules, so host networking is the fastest baseline to validate your setup.

Next Steps (Useful Ideas)

Once your local AI chat is stable, you can level it up: create model presets for different writing styles, connect it to internal documentation, or use it for structured tasks like generating incident summaries from sanitized logs. The biggest advantage of this setup is control—you decide what runs, where it runs, and what data it can access.

With Ollama and Open WebUI, a private LLM workstation is no longer a weekend project. It’s a practical tool you can deploy in minutes and refine over time.

3.

Comments