Skip to main content

Interactive

Why offline dictation feels faster

Short phrases flatter the cloud. Long sessions expose it. Experiment with your own network and see where the time goes.

Use the interactive timeline below to compare a cloud dictation workflow (upload → provider processing → optional rewrite) with Voice Type’s on-device streaming. Adjust network latency and bandwidth to see how each phase contributes to total time.

Cloud: streaming + LLM rewrite

Total: 4.4 s
Handshakes240 msService overhead350 msStreaming (upload+ASR)NaNm NaNsLLM rewrite (proxy)700 ms

On‑device: finalize + LLM rewrite

Total: 3.3 s
Finalize last ~30s2.5 sHandshake (rewrite)120 msLLM rewrite (BYOK)700 ms

Legend

Handshakes Service overhead Upload Transcribe Streaming (upload+ASR) LLM rewrite (proxy) Finalize last ~30s LLM rewrite (BYOK)

Assumptions (realistic, simplified)

  • Streaming can overlap upload with ASR; file upload cannot.
  • Handshake per hop ≈ 2×RTT (DNS+TLS+warmups). Cloud path includes ASR hop + proxy hop; BYOK uses a single hop.
  • Cloud ASR set above (default 200× real‑time). On‑device shows only the last ~30s finalize (≈2.5 s).
  • Same rewrite speed for both paths (≈1200 tok/s); proxy adds only hop latency.

Short phrases (5–15 seconds): handshakes dominate cloud flows. On-device avoids them entirely.

Long sessions: upload size and proxy hops compound latency. On-device streams live and only finalises the last ~30 seconds when you stop (about 2–3 seconds on an M1 Mac).

With bring-your-own-key rewrites, Voice Type sends text directly from your Mac to your chosen provider in a single hop. Audio stays local.