Long sessions: uploads vs 30‑second windowing - Voice Type blog                       Skip to main content


Voice Type


Pricing
Learn
Enterprise
Trust
Blog


Blog
Long sessions: uploads vs 30‑second windowing
Why streaming on‑device and finalizing only the last ~30 seconds keeps long dictations responsive.
← Back to Blog  |  Home

30 Sept 2025


Half an hour of clean audio is not a “quick upload.”

Uploading long, high‑quality audio takes time, especially on variable wifi. Many cloud tools avoid heavy compression to protect accuracy, which increases upload size.

Voice Type stays on‑device, streams continuously, and when you stop, finishes only the last ~30s window (≈2–3s on an M1). That’s why long sessions feel snappy in practice.

Explore the difference (choose  Medium  or  Long  in the demo):  /blog/latency-demo


Previous
Cleaner input, cleaner transcripts: audio conditioning for accuracy
Next
Offline vs Cloud Dictation on macOS - A Practical Guide

Related articles
Product
Punctuation that sticks: what we fixed and how to get best results
A straight look at punctuation and spacing. What we shipped to make it better, how to test it, and a few habits that help on macOS.
Engineering
Cleaner input, cleaner transcripts: audio conditioning for accuracy
Normalized loudness and gentle filtering help the recognizer hear what you meant, not the room.


Voice Type


Learn

All guides

Voice Type vs Apple Dictation

Dragon alternatives

For writers

For developers

Notion on Mac

Latency demo

Press kit


Company

Enterprise

Trust Center

Pricing

Blog

Company

Terms of service

Privacy policy

Contact us


© 2025 Careless Whisper Inc.