Set Up a Local AI Coding Assistant on Linux with Ollama and Continue (VS Code)

Why run a local AI assistant?

Cloud AI tools are convenient, but they are not always the best fit for real work. If you handle private code, customer data, or internal repositories, sending prompts to a third-party service can raise compliance and security questions. A local AI setup gives you more control over your data, works even when the network is slow, and can reduce ongoing costs. In this tutorial, you will install Ollama (a lightweight local LLM runtime) and connect it to Continue (a popular AI coding extension) in Visual Studio Code on Linux.

What you will build

By the end, you will have a working local coding assistant that can explain code, refactor functions, generate tests, and answer questions about your project directly inside VS Code. The assistant will run on your machine and serve requests through a local HTTP endpoint. This guide focuses on a practical, repeatable setup that you can copy to developer laptops or a shared workstation.

Requirements

You need a modern Linux distribution (Ubuntu, Debian, Fedora, or similar), at least 8 GB RAM (16 GB is better), and enough disk space for models (start with 5–10 GB free). A GPU is optional; many models run well on CPU for moderate usage. You also need VS Code installed and permission to run shell commands.

Step 1: Install Ollama

Ollama provides a simple way to download and run LLMs locally. Install it using the official install script. Open a terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

After installation, verify it is available:

ollama --version

On most systems, Ollama also starts a background service automatically. If you want to confirm the service is listening locally, run:

ss -lntp | grep 11434

The default API endpoint is typically http://localhost:11434.

Step 2: Download a model (recommended options)

Next, pull a model that fits your hardware. For a balanced local coding assistant, start with a smaller model and upgrade later. Run one of these commands:

ollama pull qwen2.5-coder:7b

ollama pull codellama:7b

To test it quickly in the terminal:

ollama run qwen2.5-coder:7b

Ask something simple like “Explain what a mutex is in plain English” or paste a small function and request a refactor. If responses are extremely slow, consider switching to a smaller model or closing memory-heavy applications.

Step 3: Install Continue in VS Code

Open VS Code, go to the Extensions view, and search for Continue. Install the extension published as “Continue - AI Code Assistant”. After installation, you will see a Continue panel (usually on the left activity bar) where you can chat and run code-related actions.

Step 4: Configure Continue to use Ollama

Continue reads its configuration from a JSON file. Open the Continue settings/config (the extension provides a link inside its UI, often labeled “Open Config”). In many setups the file is located under your home directory, such as:

~/.continue/config.json

Set the provider to Ollama and specify the model name you downloaded. A typical configuration looks like this:

{
  "models": [
    {
      "title": "Local Qwen Coder",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b"
    }
  ],
  "defaultModel": "Local Qwen Coder"
}

Save the file and reload VS Code. In Continue, select your local model if prompted. From this point, your chat requests and code actions should be handled by Ollama on localhost.

Step 5: Practical workflows you can use immediately

A local model becomes useful when you give it tight, specific tasks. Try these workflows in Continue: Explain a complex function in your repo, Refactor a block of code to reduce duplication, or Generate unit tests for a module. For best results, include constraints such as “keep public function signatures unchanged” or “write tests using pytest and avoid network calls”.

If you work on infrastructure code, ask the assistant to review a systemd unit file, a Kubernetes manifest, or an Nginx config for common mistakes. Since the model runs locally, it is easier to iterate quickly without worrying about token limits or uploading sensitive config snippets.

Troubleshooting tips

Continue cannot connect to Ollama: Make sure Ollama is running and listening on port 11434. Restart it if needed, or check for local firewall rules. Confirm the endpoint is reachable with curl http://localhost:11434/api/tags.

Model name mismatch: The model string in Continue must match what Ollama lists. Run ollama list and copy the exact name (including tags like :7b).

Slow responses: Use a smaller model, reduce parallel applications, or consider enabling GPU acceleration if your system supports it. Also keep prompts focused; sending entire repositories in one request will slow down any local model.

Next steps

Once the basic setup works, you can standardize it for a team by documenting the chosen model and shipping a ready-to-use Continue config. You can also experiment with different local models for different tasks (one optimized for coding, another for general documentation). The most important habit is to keep prompts precise and treat the assistant like a fast helper, not a source of final truth. With that mindset, a local AI coding assistant can become a reliable part of your daily Linux development workflow.

Comments