Model Guide
Choose the right speech model for your transcription needs.
MurmurTone uses state-of-the-art AI models for speech recognition. Each model offers different trade-offs between speed, accuracy, and resource usage. This guide helps you pick the best one for your workflow.
Recommended model. It offers the best balance of accuracy and speed, and works well on most modern computers.
Model Comparison
| Model | Speed | Accuracy | RAM | Best For |
|---|---|---|---|---|
Quick |
Fastest | Good | ~1 GB | Quick notes, chat messages |
Standard |
Fast | Better | ~1.5 GB | Casual dictation, everyday use |
Recommended |
Moderate | Great | ~2.5 GB | Documentation, emails, general work |
Professional |
Slower | Excellent | ~5 GB | Important documents, reports |
Studio |
Slowest | Best | ~10 GB | Professional transcription, accents |
Which Model Should I Use?
I need fast transcription for quick notes
Use Quick or Standard. These models transcribe almost instantly and are perfect for casual use like chat messages, quick reminders, or informal notes where minor errors are acceptable.
I want the best balance of speed and accuracy
Use Recommended. This is our recommended model for most users. It handles professional vocabulary well, runs efficiently on most computers, and produces accurate results for emails, documentation, and everyday work.
I need high accuracy for important documents
Use Professional or Studio. These models excel at complex vocabulary, technical terms, and proper nouns. Choose Professional if you have 8+ GB RAM, or Studio if you have a powerful GPU and need the absolute best accuracy.
I work with accents or non-native English speakers
Use Studio. Larger models handle accent variation significantly better. If Studio is too slow, Professional is a good compromise.
I need to translate speech to English
Use Recommended or larger. All models support translation from 90+ languages to English, but larger models produce more accurate translations. Set your Language to the source language (e.g., Spanish) and enable the Translation option in settings.
Technical Specifications
For power users, here are the detailed specifications of each model:
| Model | Parameters | Download Size | WER* | Languages |
|---|---|---|---|---|
Quick |
39 M | 75 MB | ~10% | 99 |
Standard |
74 M | 145 MB | ~7% | 99 |
Recommended |
244 M | 484 MB | ~5% | 99 |
Professional |
769 M | 1.5 GB | ~4% | 99 |
Studio |
1.55 B | 3 GB | ~2.7% | 99 |
*WER (Word Error Rate) measured on LibriSpeech clean test set. Lower is better. Real-world accuracy varies based on audio quality, background noise, and vocabulary.
GPU Acceleration
If you have an NVIDIA GPU with 4+ GB VRAM, MurmurTone can use it to dramatically speed up transcription:
- Quick/Standard: 10-20x faster with GPU
- Recommended: 8-15x faster with GPU
- Professional: 6-10x faster with GPU (requires 6+ GB VRAM)
- Studio: 5-8x faster with GPU (requires 10+ GB VRAM)
Set your Processing Mode to "Auto" and MurmurTone will automatically use your GPU if available.
Recommended model works reliably on most systems with 8 GB RAM.
Downloading Models
The Quick and Standard models are included with MurmurTone. Larger models are downloaded on-demand when you select them in Settings.
Download sizes:
Recommended: ~484 MBProfessional: ~1.5 GBStudio: ~3 GB
Models are cached locally after download, so you only need to download once.