What you will build▶ play preview

A private AI assistant running on your own hardware, talking to no one else.

Type a prompt in your terminal or hit the local REST API from your own code. Llama 3 answers. Nothing goes to OpenAI, Google, or anywhere outside your machine.

You → Ollama → local model → you

“ollama run llama3.2 ‘summarise this in 3 bullet points’”

// Before you start

What you need (3 things)

A computer running macOS, Windows, or LinuxApple Silicon (M1–M4) is fastest. Any modern CPU works.
~4GB free disk space per modelThe smallest usable model (phi3:mini) is 2.3GB. Llama 3.2 3B is 2GB.
5 minutesMost of that is the model download on first run

SkimAlready know what you're doing?

Copy both commands ↓

# 1. Install Ollama

$ curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull and chat — that's it

$ ollama run llama3.2

Step 1 of 4

Install Ollama

1 min

Ollama is a small binary that manages model downloads and runs an inference server in the background. It takes one command to install on every OS.

Terminal · mac

$ curl -fsSL https://ollama.com/install.sh | sh

Or download the .dmg from ollama.com if you prefer a GUI installer. The DMG version adds a menu-bar icon.

Verify it worked

$ ollama --version
ollama version 0.5.1

Checkpoint

Does ollama --version print a version number?

Yes → Step 2 →No, I am stuck

Step 2 of 4

Pull a model

1–4 min (download)

Now grab a model. Llama 3.2 3Bis the right starting point — it's fast, accurate enough for most tasks, and downloads in under 2 minutes on a decent connection.

Terminal · mac

$ ollama pull llama3.2

Other popular picks — run any of these instead if you prefer:

Terminal · examples

$ ollama pull phi3 # Microsoft, tiny + fast (2.3GB) $ ollama pull mistral # Mistral 7B, great for code (4.1GB) $ ollama pull gemma2 # Google, balanced (5.4GB) $ ollama pull llama3.1:70b # big — needs 48GB RAM

What you should see

pulling manifest
pulling 74701a8c35f6... 100% ▕████████▏ 2.0 GB
success

Checkpoint

Did the download finish with success?

Yes → Step 3 →No, I am stuck

Step 3 of 4

Chat with it

1 min

Now talk to it. Run the model by name — Ollama starts an interactive session. Type a prompt, press Enter. Type /bye to exit.

Terminal · mac

$ ollama run llama3.2

What you should see

>>> (type your prompt here)
>>> summarise this in 3 bullet points: [your text]
• First point...
• Second point...
• Third point...
>>> /bye

Useful one-liner patterns (pipe text into the model):

Terminal · one-liners

# Summarise a file $cat notes.txt | ollama run llama3.2 "summarise this"
# Ask a question without interactive mode $ollama run llama3.2 "what is the capital of France?"

Checkpoint

Did the model respond to your prompt?

Yes → Step 4 →No, I am stuck

Step 4 of 4 · Final

Use the API

1 min

Ollama runs a local REST API at http://localhost:11434. It's OpenAI-compatible, so any code that calls OpenAI can call Ollama with one line changed. You can call it with curl, Python, Node, or anything else.

Terminal · mac

$curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello!","stream":false}'

OpenAI-compatible endpoint (drop-in for most SDKs):

Python · openai SDK

from openai import OpenAI client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama") response = client.chat.completions.create( model="llama3.2", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content)

What you should see (curl response)

{
"model": "llama3.2",
"response": "Hello! How can I help you today?",
"done": true
}

// Status

cooked. baked. worked.

You now have a local AI server running on your machine. No API key. No cloud. No monthly bill. 100+ models a ollama pull away.

$ ollama run llama3.2 "hello"

Hello! How can I help you today?

$ ollama list

NAME ID SIZE

llama3.2:latest a80c4f17acd5 2.0 GB

cooked · baked · worked ✓

// the honest part

What we did and didn't test

The 5-minute estimate assumes a fast internet connection for the model download and an M-series Mac or recent CPU. On an older Intel Mac, first-token latency for llama3.2 is about 3–4 seconds. On Windows with an NVIDIA GPU, it's under 1 second.

We tested Mac (M3), Windows (RTX 3080), and Ubuntu (A100). The 93% first-time success rate reflects the install — the main failure mode on Windows is the PATH not updating until you open a new terminal window after installing.

Model quality varies a lot. Llama 3.2 3B is good for summaries, Q&A, and simple code. For complex reasoning or long-form writing, pull the 8B or 70B variant. The 70B needs 48GB RAM — most laptops top out at 16GB or 32GB.

// Things that broke for other people

Trouble? Try these first.

4 fixes

WindowsStep 1 · command not foundmost common

“ollama is not recognized”

'ollama' is not recognized as an internal or external command

The installer added Ollama to PATH, but your existing terminal window doesn't know yet. Close every open terminal window and open a new one.PATH changes don't apply to already-running shells.

AllStep 3 · Ollama server not runningcommon

Model won't start / “connection refused”

Error: dial tcp 127.0.0.1:11434: connect: connection refused

Ollama needs its background server running. Start it manually: run ollama serve in one terminal, then ollama run llama3.2 in another. On Mac/Windows the installer usually starts it automatically on login.

AllStep 2 · not enough disk space

Pull fails partway through

Error: no space left on device

Models are 2–8GB each. Free up disk space, then retry. Use ollama rm modelname to delete models you no longer need. Run ollama listto see what's installed.

AllStep 3 · slow responses

Model is very slow / takes 30+ seconds per token

You're probably running a model that's too large for your RAM. Try ollama run phi3instead — it's 2.3GB and designed to run fast on CPU. The 70B models need 48GB RAM to fit in memory; below that they page to disk and crawl.

// the daily build

Liked this? One a day, in your email.

Tomorrow's best AI project. Same kind of guide. Free. No spam ever.

Run any AI model on your laptop. 2 commands.

// Build stats

A private AI assistant running on your own hardware, talking to no one else.

What you need (3 things)

Install Ollama

Pull a model

Chat with it

Use the API

cooked. baked. worked.

What we did and didn't test

Trouble? Try these first.

Liked this? One a day, in your email.

A private AI assistant running on your own hardware, talking to no one else.

What you need (3 things)

Install Ollama

Pull a model

Chat with it

Use the API

cooked. baked. worked.

What we did and didn't test

Trouble? Try these first.

Build more after this

Give your local AI a ChatGPT-style interface

Transcribe meetings and audio files. Free, offline, 99 languages.

One file. Your own ChatGPT. Free.

Liked this? One a day, in your email.