How Whisper Runs in Your Browser (No Server Needed)
2026-06-08 · 6 min read
It used to be that speech-to-text required a server with a GPU. Now it can run in a browser tab. Here's how that became possible and what it unlocks.
The pieces
- •Whisper — OpenAI's open speech-recognition model, available in small, fast variants
- •Transformers.js — runs Hugging Face models in JavaScript
- •WebGPU / WebAssembly — let the browser do heavy maths efficiently
What happens when you record
The browser captures your audio, resamples it to 16 kHz, and feeds it to a Whisper model that was downloaded once from a CDN and cached. The model produces text — all on your device. No audio is ever uploaded.
Why it's a big deal
It removes the server from the privacy equation entirely, makes the tool free to run at scale, and works offline after the first model load. ParleyNotes is built on exactly this stack.
Try private AI meeting notes free
Record or upload a meeting and get an on-device transcript and notes. No account, no bot, no cloud.
Open ParleyNotes →