Not every AI task needs a frontier model or a cloud round-trip. Drafting an email, summarizing a document, renaming a batch of files, asking a quick coding question — a small model running on your own laptop handles all of these well, privately, and for free.
Ollama is the simplest way to get there. It runs open models locally with a single command, and exposes them through an API that most AI tools already know how to talk to.
Why bother running AI locally
- Cost. Recurring subscriptions add up. Local inference is free after the download.
- Privacy. Your prompts and documents never leave your machine.
- Availability. It works offline, on a plane, behind a firewall.
- Learning. Running the model yourself demystifies what's actually happening.
This isn't about replacing the big hosted models — it's about not reaching for them when something smaller does the job.
The five-minute setup
-
Install Ollama from the official site for your OS.
-
Pull a model. Start small so it runs comfortably:
ollama pull llama3.2 -
Chat with it right from the terminal:
ollama run llama3.2
That's a working local assistant. No account, no API key.
Wiring it into your workflow
Ollama serves an API at http://localhost:11434. That means any editor,
script, or app that supports a custom endpoint can point at your local model —
note-takers, code editors, and command-line tools included.
The goal isn't to run everything locally. It's to know which tasks can run locally — and quietly move them off your bill.
Choosing a model size
A rough rule: pick the smallest model that's still good enough for the task. Bigger models are slower and need more memory; for everyday drafting and summarizing, a small or mid-size model is usually indistinguishable in practice.
Start with one small model, use it for a week, and only reach for something larger when you actually hit its limits.