What you will build▶ play preview

A full transcript of any video, in any language, running on your own machine.

Drop in an MP3 or paste a YouTube URL. Get back a text file, an SRT subtitle file, and a VTT for the web. Whisper handles 99 languages and can translate non-English into English in the same run.

Audio in → transcript out →

“Welcome back to the podcast. Today we're talking about the future of...”

// Before you start

What you need (4 things)

Python 3.8 or newer (3.10+ recommended)Check with python3 --version
~3GB free disk spaceThe “base” model is 142MB. “Large” is 2.9GB.
An audio or video fileOr a YouTube URL — we'll download it
8 minutesMaybe 12 if ffmpeg fights you

SkimAlready know what you're doing?

Copy all 4 commands ↓

# 1. Install ffmpeg (Mac shown — see step 1 for Win/Linux)

$ brew install ffmpeg

# 2. Install Whisper + yt-dlp

$ pip install -U openai-whisper yt-dlp

# 3. Grab a YouTube video as mp3

$ yt-dlp -x --audio-format mp3 https://youtube.com/watch?v=...

# 4. Transcribe

$ whisper audio.mp3 --model base

Step 1 of 5

Install ffmpeg

1–3 min

Whisper reads audio. ffmpegis the tool that opens almost any audio or video file and feeds it to Whisper. It is the most common “why isn't this working” in this guide. We install it first. This is the hardest step on Windows. Stick with it.

Terminal · mac

$ brew install ffmpeg

Don't have brew? Run this one-liner first:

Terminal · mac

$/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Verify it worked

$ ffmpeg -version
ffmpeg version 6.1 Copyright (c) 2000-2024 ...
built with Apple clang version 15.0.0 ...

This might happen (Windows, very common)

ffmpeg is not recognized

'ffmpeg' is not recognized as an internal or external command

The exe is on your disk but Windows doesn't know where to find it. Add the folder containing ffmpeg.exe to your PATH. In Settings → search “Edit environment variables” → User variables → Path → New → paste the folder path (e.g. C:\ffmpeg\bin). Close every open terminal and open a new one — PATH changes do not apply to already-open shells.

Checkpoint

Does ffmpeg -version print a version number?

Yes → Step 2 →No, I am stuck

Step 2 of 5

Install Whisper (and yt-dlp)

1–2 min

Now we install Whisper itself. pipis the Python package installer — it should already be on your computer. We're also installing yt-dlp so we can fetch a YouTube video later. Both are one command.

Terminal · mac

$ pip install -U openai-whisper yt-dlp

What you should see (lots of text, do not panic)

Collecting openai-whisper
Collecting torch
Downloading torch-2.x.x-...whl (200+ MB)
... (this goes on for a while)
Successfully installed openai-whisper torch yt-dlp ...

This might happen

pip is too old / wrong Python

ERROR: Could not find a version that satisfies the requirement openai-whisper or Python 3.7 is too old. Whisper requires Python 3.8+.

Try pip3 instead of pip. If that fails too, install a fresh Python 3.11 from python.org/downloads and restart your terminal.

Checkpoint

Does whisper --help print a help menu?

Yes → Step 3 →No, I am stuck

Step 3 of 5

Grab an audio file (or any video)

1 min

You can transcribe anything ffmpeg can read — MP3, MP4, MOV, M4A, WAV, etc. If you already have a file, skip to step 4. Otherwise let's grab a YouTube video as MP3.

Terminal · mac

$ yt-dlp -x --audio-format mp3 https://www.youtube.com/watch?v=YOUR_VIDEO_ID

-x means “extract audio only.” --audio-format mp3 converts it. The MP3 lands in your current folder.

What you should see

[youtube] YOUR_VIDEO_ID: Downloading webpage
[download] 100% of 12.3MiB at 4.2MiB/s
[ExtractAudio] Destination: My Video Title.mp3
✓ Deleting original file (intermediate)

This might happen

YouTube blocks the download

ERROR: [youtube] ... : Sign in to confirm you're not a bot

Update yt-dlp first (pip install -U yt-dlp). If that still fails, try a different public video — some are age-locked or member-only. For local files, just rename them and continue.

Checkpoint

Is there an .mp3 file in your folder?

Yes → Step 4 →No, I am stuck

Step 4 of 5

Transcribe it

2–5 min

Now the main event. One command. Whisper reads your audio file and writes the transcript. The first time you run this, it also downloads the model file — 142MB for base, 2.9GB for large. Start with base— it's fast and 90%+ accurate on English.

Terminal · mac

$ whisper audio.mp3 --model base

Other model sizes when you need more accuracy (slower):

Terminal · examples

$ whisper audio.mp3 --model small $ whisper audio.mp3 --model medium $ whisper audio.mp3 --model large $ whisper audio.mp3 --model turbo # fastest large-class model

Useful flags:

Terminal · flags

$ whisper audio.mp3 --language en # skip auto-detect $ whisper audio.mp3 --task translate # any language → English $ whisper audio.mp3 --output_format srt # subtitles only

What you should see

100%|████████████████| 142M/142M [00:08<00:00, 17.4MB/s]
[00:00.000 --> 00:06.400] Welcome back to the podcast.
[00:06.400 --> 00:13.200] Today we're talking about...
... (lines stream as it processes)
✓ done.

This might happen

CUDA out of memory (only matters if you have an NVIDIA GPU)

RuntimeError: CUDA out of memory.

The model is too big for your GPU. Either use a smaller model (--model small) or force CPU with --device cpu (slower but always works).

Checkpoint

Did the command finish without errors and leave new files in your folder?

Yes → Step 5 →No, I am stuck

Step 5 of 5 · Final

Read the output

30 sec

Whisper drops a handful of files in the same folder as your audio. Each one is the same transcript in a different format:

audio.txt — plain text, no timestamps. Best for copy-paste into anywhere.
audio.srt — subtitles for video players (VLC, Premiere, DaVinci Resolve).
audio.vtt— subtitles for the web (HTML5 <track>).
audio.tsv — tab-separated timestamps + text. Good for spreadsheets.
audio.json — machine-readable, includes per-word confidence.

Terminal · mac

$ ls audio.*

What you should see

audio.mp3
audio.txt
audio.srt
audio.vtt
audio.tsv
audio.json

// Status

cooked. baked. worked.

You just turned an opaque audio file into searchable, editable text. 99 languages. No API key. No cloud upload. Runs on a laptop battery if you want.

$ whisper audio.mp3 --model base

✓ audio.txt

✓ audio.srt

✓ audio.vtt

done.

cooked · baked · worked ✓

// the honest part

What we did and didn't test

Numbers above (8 minutes total, 88% worked first time) are estimates pending Michael's end-to-end verification on all three OSes. We will update them with real numbers once he re-runs this exact flow.

Three things to expect that the README doesn't spell out: (1) M1/M2 Mac is fast, Intel Mac feels slow on models bigger than base. (2) Windows ffmpeg PATH is the hardest step — most failures happen here, not in Whisper itself. (3) The large model is a 2.9GB download the first time you call it — plan for it.

If you have an NVIDIA GPU, Whisper uses it automatically and is dramatically faster on bigger models. Apple Silicon also gets accelerated via MPS. Pure CPU works fine for base and small; large on CPU is ~30 minutes for an hour of audio.

// Things that broke for other people

Trouble? Try these first.

5 fixes

WindowsStep 1 · ffmpeg PATHmost common

“ffmpeg is not recognized”

'ffmpeg' is not recognized as an internal or external command

ffmpeg.exe is on disk but Windows can't find it. Add the folder (the one containing ffmpeg.exe) to PATH: Start → “Edit environment variables” → User variables → Path → New → paste the folder. Then close every open terminal and reopen.

AllStep 2 · Python versioncommon

“Python 3.7 is too old”

ERROR: Could not find a version that satisfies the requirement openai-whisper

Whisper needs Python 3.8+. Try pip3 instead of pip. If that still fails, install Python 3.11 from python.org/downloads and restart your terminal.

MacStep 1 · brew

“brew: command not found”

Homebrew isn't installed yet. Run the bootstrap line shown in step 1. After it finishes, brew prints a hint to add itself to PATH — copy/paste that, then try brew install ffmpeg again.

NVIDIAStep 4 · GPU memory

CUDA out of memory on --model large

RuntimeError: CUDA out of memory.

Drop to --model medium or small. Or force CPU with --device cpu(much slower but doesn't care about VRAM).

AllStep 3 · YouTube blocks

yt-dlp fails with “Sign in to confirm you're not a bot”

Update yt-dlp first (pip install -U yt-dlp). The project ships fixes weekly for YouTube's anti-bot changes. If it still fails on a specific video, try another — some are age- or member-locked.

// the daily build

Liked this? One a day, in your email.

Tomorrow's best AI project. Same kind of guide. Free. No spam ever.

Turn any YouTube video into searchable text.

// Build stats

A full transcript of any video, in any language, running on your own machine.

What you need (4 things)

Install ffmpeg

Install Whisper (and yt-dlp)

Grab an audio file (or any video)

Transcribe it

Read the output

cooked. baked. worked.

What we did and didn't test

Trouble? Try these first.

Liked this? One a day, in your email.

A full transcript of any video, in any language, running on your own machine.

What you need (4 things)

Install ffmpeg

Install Whisper (and yt-dlp)

Grab an audio file (or any video)

Transcribe it

Read the output

cooked. baked. worked.

What we did and didn't test

Trouble? Try these first.

Build more after this

Make AI read your email in your voice

One file. Your own ChatGPT. Free.

Turn any PDF into a podcast (it sounds real)

Liked this? One a day, in your email.