Use the interactive timeline below to compare a cloud dictation workflow (upload → provider processing → optional rewrite) with Voice Type’s on-device streaming. Adjust network latency and bandwidth to see how each phase contributes to total time.
Cloud: streaming + LLM rewrite
Total: 4.4 sOn‑device: finalize + LLM rewrite
Total: 3.3 sLegend
Assumptions (realistic, simplified)
- Streaming can overlap upload with ASR; file upload cannot.
- Handshake per hop ≈ 2×RTT (DNS+TLS+warmups). Cloud path includes ASR hop + proxy hop; BYOK uses a single hop.
- Cloud ASR set above (default 200× real‑time). On‑device shows only the last ~30s finalize (≈2.5 s).
- Same rewrite speed for both paths (≈1200 tok/s); proxy adds only hop latency.
Short phrases (5–15 seconds): handshakes dominate cloud flows. On-device avoids them entirely.
Long sessions: upload size and proxy hops compound latency. On-device streams live and only finalises the last ~30 seconds when you stop (about 2–3 seconds on an M1 Mac).
With bring-your-own-key rewrites, Voice Type sends text directly from your Mac to your chosen provider in a single hop. Audio stays local.
