Skip to main content

Product comparison

Voice Type vs SuperWhisper

Two on-device Whisper apps for Mac. Same underlying model architecture, different approaches to workflow and audio processing.

Both Voice Type and SuperWhisper run OpenAI Whisper locally on your Mac using Apple's Core ML. The difference is in what happens before and after recognition: audio conditioning, hotkey behavior, and optional AI features.

Short answer

  • Pick Voice Type if you want hold-to-dictate hotkeys, audio conditioning for noisy environments, and a single one-time price.
  • Pick SuperWhisper if you prefer tiered pricing, AI rewriting features, or need the Pro tier's unlimited transcription.

At a glance

Finalization speed

Voice Type finalizes in under 2 seconds regardless of how long you dictate. The streaming architecture processes audio in chunks, so only the last segment needs finalizing when you stop.

Recognition accuracy

Voice Type uses beam search decoding for higher accuracy on complex phrases. Combined with RNNoise preprocessing and proper audio conditioning, technical terms transcribe correctly.

Punctuation handling

Voice Type has near-parity with Dragon Dictate for spoken punctuation—a key feature professional dictation users expect. Say 'period', 'comma', 'new paragraph' naturally.

Custom vocabulary

Voice Type supports custom word priming following Whisper best practices for prompt conditioning. Add product names, technical terms, and jargon that transcribe correctly.

Audio preprocessing

Voice Type applies LUFS normalization, RNNoise noise suppression, and silence trimming before recognition. SuperWhisper relies on the raw Whisper model without preprocessing.

Pricing

Voice Type: $19.99 one-time. SuperWhisper: $9.99 (Basic), $19.99 (Standard), $29.99 (Pro) - tiered pricing with different feature sets.

Audio preprocessing

Voice Type conditions audio before it reaches the Whisper model. This includes loudness normalization via LUFS metering, background noise reduction using RNNoise (a recurrent neural network trained specifically for speech denoising), and silence trimming. The goal: cleaner input produces more accurate output, especially in non-ideal recording conditions.

SuperWhisper passes audio directly to the Whisper model. This works well in quiet environments but may produce more errors with background noise or inconsistent microphone levels.

Speed architecture

Voice Type finalizes in under 2 seconds no matter how long you've been dictating. The architecture streams audio in ~30-second windows, processing each chunk as you speak. When you release the hotkey, only the final segment needs processing—the rest is already done.

This streaming approach means consistent latency whether you dictate for 10 seconds or 10 minutes. Cloud-based tools often have variable latency that scales with audio length.

Beam search and accuracy

Voice Type uses beam search decoding rather than greedy decoding. Beam search explores multiple possible transcriptions simultaneously and selects the most likely sequence, improving accuracy on ambiguous or technical phrases.

Combined with proper prompt conditioning for custom vocabulary (following OpenAI's Whisper documentation), technical terms, product names, and domain-specific jargon transcribe correctly.

Model options

SuperWhisper lets you choose from multiple Whisper model sizes (tiny, base, small, medium, large). Smaller models are faster but less accurate; larger models are slower but handle complex vocabulary better.

Voice Type ships with an optimized model tuned for the hold-to-dictate workflow where consistent sub-2-second finalization matters. The beam search configuration and audio preprocessing compensate for model size trade-offs.

Who should choose what

Choose Voice Type if…

  • You need sub-2-second finalization regardless of dictation length.
  • You want Dragon-level punctuation support ('period', 'new paragraph').
  • You dictate technical terms and need custom vocabulary priming.
  • You work in noisy environments and need audio preprocessing.

Choose SuperWhisper if…

  • You want AI-powered text rewriting or formatting.
  • You need to choose from multiple Whisper model sizes.
  • You prefer tiered pricing based on features you need.
  • You want toggle-mode dictation instead of hold-to-talk.

Technology references

Try the free 7-day trial