Why offline dictation feels faster - an interactive look Skip to main content Voice Type Pricing Learn Enterprise Trust Blog Interactive Why offline dictation feels faster Short phrases flatter the cloud. Long sessions expose it. Experiment with your own network and see where the time goes. ← Back to Blog Home Use the interactive timeline below to compare a cloud dictation workflow (upload → provider processing → optional rewrite) with Voice Type’s on-device streaming. Adjust network latency and bandwidth to see how each phase contributes to total time. Scenario Short (≈5s speech) Medium (≈2 min) Long (≈30 min) Network profile Office fiber - RTT 20 ms, ↑40 Mbps Café Wi‑Fi - RTT 60 ms, ↑5 Mbps Airport + VPN - RTT 120 ms, ↑2 Mbps My network (measured) - RTT 60 ms, ↑5 Mbps Use my network Cloud STT speed Whisper Large v3 - 189× Whisper Large v3 Turbo - 216× Distil-Whisper Large v3 - 250× (EN) Custom - 200× Compressed upload (≈128 kbps) Cloud streaming (overlap upload+ASR) LLM rewrite (BYOK) Replay animation Cloud: streaming + LLM rewrite Total: 4.4 s LLM rewrite (proxy) Handshakes 240 ms Service overhead 350 ms Streaming (upload+ASR) NaNm NaNs LLM rewrite (proxy) 700 ms On‑device: finalize + LLM rewrite Total: 3.3 s Finalize last ~30s LLM rewrite (BYOK) Finalize last ~30s 2.5 s Handshake (rewrite) 120 ms LLM rewrite (BYOK) 700 ms Legend Handshakes Service overhead Upload Transcribe Streaming (upload+ASR) LLM rewrite (proxy) Finalize last ~30s LLM rewrite (BYOK) Assumptions (realistic, simplified) Streaming can overlap upload with ASR; file upload cannot. Handshake per hop ≈ 2×RTT (DNS+TLS+warmups). Cloud path includes ASR hop + proxy hop; BYOK uses a single hop. Cloud ASR set above (default 200× real‑time). On‑device shows only the last ~30s finalize (≈2.5 s). Same rewrite speed for both paths (≈1200 tok/s); proxy adds only hop latency. Short phrases (5–15 seconds): handshakes dominate cloud flows. On-device avoids them entirely. Long sessions: upload size and proxy hops compound latency. On-device streams live and only finalises the last ~30 seconds when you stop (about 2–3 seconds on an M1 Mac). With bring-your-own-key rewrites, Voice Type sends text directly from your Mac to your chosen provider in a single hop. Audio stays local. Voice Type Learn All guides Voice Type vs Apple Dictation Dragon alternatives For writers For developers Notion on Mac Latency demo Press kit Company Enterprise Trust Center Pricing Blog Company Terms of service Privacy policy Contact us © 2025 Careless Whisper Inc.