Running an AI chatbot locally is no longer a research project reserved for labs. With today’s lightweight LLM runtimes, you can host a private assistant on your own Linux machine and keep your data off third-party servers. This tutorial shows how to install Ollama (a simple local LLM runner) and Open WebUI (a clean web interface) using Docker, then load a model and start chatting from your browser.
What you will build
By the end of this guide, you will have a local web-based AI chat interface reachable from your LAN (or just your own PC). You’ll be able to pull models on demand, start conversations, and keep everything on your own storage. The setup works well for home labs, internal IT tools, offline environments, and privacy-focused workflows.
Prerequisites
Recommended system: 64-bit Linux (Ubuntu/Debian/Fedora work fine), at least 8 GB RAM (16 GB is better), and plenty of disk space (models can take several GB). A GPU is optional; CPU-only is still usable with smaller models. You also need admin access (sudo) and an internet connection for the initial downloads.
Step 1: Install Docker
If Docker is not installed, install it using your distro’s package manager or the official Docker repository. On Ubuntu/Debian, you can use:
sudo apt update
sudo apt install -y docker.io
sudo systemctl enable --now docker
To run Docker without typing sudo every time, add your user to the docker group (log out and back in after):
sudo usermod -aG docker $USER
Step 2: Create a working folder and Docker network
A dedicated folder keeps configuration tidy. Create it anywhere you like:
mkdir -p ~/local-ai
cd ~/local-ai
Create a Docker network so containers can reliably talk to each other by name:
docker network create localai
Step 3: Start Ollama (LLM runtime)
Ollama exposes an API that other apps (like WebUI) can use. Start it with a persistent volume so models survive reboots:
docker run -d --name ollama
--network localai
-p 11434:11434
-v ollama:/root/.ollama
ollama/ollama:latest
Verify it is running:
docker ps
Step 4: Pull a model and test from the command line
Now pull a model inside the Ollama container. For a balanced first run, try a smaller model if your RAM is limited. Example:
docker exec -it ollama ollama pull llama3.2
Test a quick prompt:
docker exec -it ollama ollama run llama3.2 "Write a short checklist for patching a Linux server safely."
If you get a response, the core runtime is working.
Step 5: Start Open WebUI (browser chat interface)
Open WebUI provides a friendly UI similar to popular online chat tools, but it stays in your environment. Start it and point it to Ollama using the container name:
docker run -d --name open-webui
--network localai
-p 3000:8080
-e OLLAMA_BASE_URL=http://ollama:11434
-v open-webui:/app/backend/data
ghcr.io/open-webui/open-webui:main
Open your browser and go to:
http://localhost:3000
If you’re accessing from another PC on the network, replace localhost with the Linux server’s IP (for example, http://192.168.1.50:3000).
Step 6: Select the model and start chatting
In Open WebUI, look for the model selector. If your Ollama container already pulled llama3.2, it should appear automatically. Choose it, start a new chat, and try an IT-focused prompt such as “Explain the difference between RAID1 and RAID10 with practical examples.”
Troubleshooting tips (common issues)
WebUI loads but no models appear: Confirm the environment variable is correct and that both containers share the same Docker network. Run docker logs open-webui and look for connection errors to http://ollama:11434.
Slow responses or timeouts: Use a smaller model, close other heavy workloads, and verify you have enough RAM. On CPU-only systems, large models can feel sluggish.
Cannot access from another computer: Make sure port 3000 is allowed through your firewall (UFW, firewalld, or your cloud security group). Also verify Open WebUI is bound via Docker’s port mapping.
Optional: Make it easier with Docker Compose
Once you’re happy with the setup, consider moving these commands into a Docker Compose file for simpler restarts and upgrades. The key idea remains the same: one container runs Ollama on port 11434, another runs Open WebUI on port 3000, and a shared network connects them.
With this local AI stack, you can experiment safely, build internal tools, and keep sensitive prompts under your control. As you get comfortable, try different models, tune prompts for helpdesk automation, or connect the WebUI to documentation snippets for faster internal answers.
Comments
Post a Comment