You say “Thanks.” The network says “Hold on.”
TL;DR
- For short clips, DNS/TLS and upload setup can be most of the wait.
- On-device avoids the handshake tax entirely.
- The “feel” of dictation is dominated by stop-to-text latency.
Cloud flows typically include multiple handshakes (TLS/DNS) and at least one remote hop. For very short phrases, this setup time can dominate. On‑device dictation avoids the detours entirely: your audio stays local, text appears immediately, and there’s nothing to upload.
See the effect in the interactive demo (choose Short and try different networks): /blog/latency-demo
Related: Offline stays fast · Long sessions
