How to Run a Local AI Assistant with Ollama on Linux (Plus a Simple Web UI)

Why run a local AI assistant?

Cloud AI tools are convenient, but a local setup can be faster for repeated tasks, cheaper over time, and more private for sensitive notes, logs, or internal documentation. Running an AI model locally is also a great way to learn modern AI tooling without committing to a paid API. In this guide, you will install Ollama on Linux, download a model, test it from the terminal, and optionally add a lightweight web interface for a more comfortable chat experience.

Prerequisites

You need a Linux machine (Ubuntu/Debian/Fedora/Arch all work), at least 8 GB RAM for smaller models, and preferably a modern CPU. A GPU is helpful but not required for many models. You will also need curl and basic terminal access with sudo privileges.

Step 1: Install Ollama

Ollama provides a simple installer for Linux. Open a terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

After installation, verify the service is available:

ollama --version

On most systems, Ollama runs as a background service. If you want to check its status on a systemd-based distribution, use:

systemctl status ollama

Step 2: Download a model (and understand what you are pulling)

With Ollama, you download models using the pull command. A good starting point for general chat is a smaller, responsive model. For example:

ollama pull llama3.1

If disk space or RAM is limited, consider smaller variants (often labeled with fewer parameters). If you want code-focused answers, try a coding model such as:

ollama pull codellama

Model size matters. Larger models typically produce better results but require more RAM and may run slower. If performance feels sluggish, choose a smaller model rather than assuming something is broken.

Step 3: Chat with the model from the terminal

To start a chat session:

ollama run llama3.1

You can now type prompts and get replies immediately. This is perfect for quick tasks like generating a bash one-liner, summarizing a local change log, or drafting troubleshooting steps.

For scripting, you can also pass a prompt directly:

ollama run llama3.1 "Write a systemd unit that restarts a service on failure."

Step 4: Enable remote access safely (optional but common)

By default, many local AI setups listen only on localhost for safety. If you want to use Ollama from another machine on your LAN, you need to bind it carefully and protect it with firewall rules. First, check what address Ollama is listening on:

ss -tulpen | grep 11434

If you decide to expose it, do it on a trusted network only, and restrict access to specific IPs. On Ubuntu with UFW, for example, you can allow a single workstation:

sudo ufw allow from 192.168.1.50 to any port 11434

Avoid opening the port to the public internet. A local AI endpoint without authentication is not something you want exposed.

Step 5: Add a simple web UI (Open WebUI)

Terminal chat is efficient, but a web interface makes long conversations easier and adds quality-of-life features. One popular option is Open WebUI, which can connect to Ollama. The easiest deployment is with Docker. If Docker is not installed, install it from your distribution’s official docs first.

Run Open WebUI as a container:

docker run -d --name open-webui -p 3000:8080 -e OLLAMA_BASE_URL=http://host.docker.internal:11434 -v open-webui:/app/backend/data --restart unless-stopped ghcr.io/open-webui/open-webui:main

On some Linux hosts, host.docker.internal may not resolve by default. If that happens, use your host’s LAN IP (for example, http://192.168.1.10:11434) or add the Docker host gateway option depending on your Docker version. Once the container is running, open:

http://localhost:3000

Complete the initial setup in the browser, then select the Ollama model you downloaded. You should be able to chat immediately through the UI while Ollama continues doing the inference locally.

Troubleshooting tips (the issues people actually hit)

Model is slow or the system becomes unresponsive: Use a smaller model, close memory-heavy apps, or move to a machine with more RAM. Local AI is RAM-hungry, and swapping to disk will kill performance.

Ollama service is not running: Restart it with sudo systemctl restart ollama and check logs using journalctl -u ollama --no-pager -n 100.

Web UI cannot connect to Ollama: Confirm Ollama is reachable at http://127.0.0.1:11434 from the host, then adjust the Open WebUI environment variable OLLAMA_BASE_URL to point to the correct address.

Next steps

Once your local assistant is working, you can create repeatable prompts for helpdesk replies, generate configuration templates, or summarize technical notes without sending data to a third party. For better results, experiment with different models and keep your prompts specific. Local AI gets impressive quickly when you give it clear context and constraints.

LifeBytes Journal

Search This Blog

Embracing Zero Trust Architecture: The Future of Cybersecurity in 2026