Why run AI locally?
If you work in IT, helpdesk, development, or sysadmin roles, you probably paste logs, configs, or customer data into tools to get quick answers. The problem is that cloud AI services can be expensive, limited, or simply not allowed in regulated environments. Running a local AI stack on your own Linux box gives you privacy, predictable performance, and the ability to keep everything inside your network.
In this tutorial you will install Ollama (a lightweight local LLM runner) and Open WebUI (a clean web interface) on Ubuntu. You will end up with a browser-based “ChatGPT-style” experience that talks to models running on your machine. This setup works on CPU-only systems and can also use an NVIDIA GPU if you have one.
What you will build
You will install Ollama as a service, then run Open WebUI in Docker and connect it to Ollama. After that, you will download a model (for example, Llama or Mistral variants), confirm it responds, and finally secure and persist the environment for daily use.
Requirements
You need an Ubuntu system (22.04 or newer is ideal), at least 8 GB RAM (16 GB+ recommended), and enough free disk space (models can range from a few GB to tens of GB). For the web UI portion you should have Docker installed. If you want GPU acceleration, you will also need a compatible NVIDIA driver and the NVIDIA Container Toolkit.
Step 1: Update Ubuntu and install Docker
Start by updating packages and installing Docker. If Docker is already installed, you can skip the installation step and just confirm it works.
Commands:
sudo apt update && sudo apt -y upgrade
sudo apt -y install ca-certificates curl gnupg
sudo apt -y install docker.io
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
Log out and log back in so your user can run Docker without sudo. Verify:
docker run --rm hello-world
Step 2: Install Ollama
Ollama is simple to install and runs as a background service. It exposes an API on your machine, which the web interface will use.
Commands:
curl -fsSL https://ollama.com/install.sh | sh
Confirm the service is running:
systemctl status ollama --no-pager
Step 3: Pull a model and test Ollama from the terminal
Now download a model. A good starting point is a smaller model that runs well on CPU. If you have more RAM and want better responses, choose a larger one. The exact names can change over time, but these examples are commonly available.
Commands:
ollama pull llama3.1
ollama run llama3.1
When prompted, ask something practical like: “Explain what this Nginx error means and how to fix it.” If you get a sensible answer, Ollama is working.
Step 4: Run Open WebUI with Docker
Open WebUI provides a friendly interface, chat history, and a simple model selector. We will run it as a container and point it at the Ollama API.
Commands:
docker volume create open-webui
docker run -d --name open-webui --restart unless-stopped \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
On Linux, host.docker.internal may not resolve on some setups. If the WebUI cannot connect to Ollama, rerun the container using the host network mode instead:
docker rm -f open-webui
docker run -d --name open-webui --restart unless-stopped \
--network=host \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
Open your browser and visit http://localhost:3000 (or your server IP if remote). Create an admin account when prompted. You should see your Ollama models listed in the UI.
Step 5: Optional GPU acceleration (NVIDIA)
If you have an NVIDIA GPU, install the correct driver first, then add the NVIDIA Container Toolkit if you plan to run GPU-enabled containers. Ollama itself can use the GPU on the host if the drivers are correctly installed. Confirm your GPU is visible:
nvidia-smi
If nvidia-smi works, test performance by running a model and watching GPU utilization in another terminal. If you see GPU usage increase during generation, your local AI is accelerated.
Step 6: Basic hardening and useful tips
If this server is on a network, avoid exposing the WebUI to the entire internet. Place it behind a reverse proxy with authentication (for example, Nginx with basic auth) or restrict access at the firewall. Also remember that models can store chat history in the WebUI volume, so treat the data directory like sensitive application data and back it up appropriately.
A few practical tips: keep an eye on disk usage as you try different models, standardize on one or two “default” models for your team, and document prompts for your most common workflows (log analysis, PowerShell troubleshooting, ticket replies, and postmortems). Local AI is at its best when it is part of a repeatable process, not just a toy.
Conclusion
With Ollama and Open WebUI, you can run a capable local AI assistant on Ubuntu in under an hour, using either CPU-only hardware or an NVIDIA GPU for faster output. The result is a private, controllable tool that can help with troubleshooting, scripting ideas, documentation drafts, and day-to-day IT tasks—without sending your data to external services.
Comments
Post a Comment