If the input is messy, the output will be too.

TL;DR

Normalize loudness so words land at consistent levels.
Cut low-frequency rumble (desk thumps, HVAC) with a light high-pass filter.
Use noise-aware VAD so silence and background don’t get “transcribed.”
Improve the signal before recognition; don’t rely on post-processing to “fix” mistakes.

Voice Type normalizes loudness to a consistent target and applies a light high-pass filter to reduce low-frequency rumble. Combined with noise-aware voice activity detection, this gives the model input closer to what it was trained on — fewer garbles and more stable punctuation.

We avoid heavy “prompt fixes” that can make transcripts look confident but less faithful. Instead, we improve the signal before recognition.

What you can do today

Speak closer to the mic, not louder. Cleaner signal beats higher volume.
Reduce room noise (fans, keyboard clacks) where possible.
If your tool offers it, enable VAD/noise suppression and keep it gentle — clipping consonants hurts accuracy.

Related: RNNoise VAD · Accuracy examples

Cleaner input, cleaner transcripts: audio conditioning for accuracy

Key takeaways

TL;DR

What you can do today

Dictate into any Mac text field without waiting on uploads.

Related articles