How Whisper Runs in Your Browser (No Server Needed)

2026-06-08 · 6 min read

It used to be that speech-to-text required a server with a GPU. Now it can run in a browser tab. Here's how that became possible and what it unlocks.

The pieces

•Whisper — OpenAI's open speech-recognition model, available in small, fast variants
•Transformers.js — runs Hugging Face models in JavaScript
•WebGPU / WebAssembly — let the browser do heavy maths efficiently

What happens when you record

The browser captures your audio, resamples it to 16 kHz, and feeds it to a Whisper model that was downloaded once from a CDN and cached. The model produces text — all on your device. No audio is ever uploaded.

Why it's a big deal

It removes the server from the privacy equation entirely, makes the tool free to run at scale, and works offline after the first model load. ParleyNotes is built on exactly this stack.

Try private AI meeting notes free

Record or upload a meeting and get an on-device transcript and notes. No account, no bot, no cloud.

Open ParleyNotes →