In a single announcement that is already shaking the $8-billion voice-tech market, French startup Mistral AI today released Voxtral, a family of open-source speech-understanding models that deliver record-breaking accuracy for less than half the cost of OpenAI Whisper or ElevenLabs Scribe.
Mistral’s internal tests — dubbed the "Voxtral Triangle Benchmark” — compare three corners of the market:
Across 16 public data sets, Voxtral Small achieves:
The numbers are even stronger on multilingual Common Voice: Voxtral outperforms Whisper in every language tested — including low-resource Hindi and Arabic.
Unlike traditional pipelines that chain an ASR model with a separate LLM, Voxtral fuses both steps into a single neural network. The result:
For privacy-first teams, both 3 B and 24 B checkpoints can be downloaded from Hugging Face and run offline.
For cloud users, Mistral’s new transcription-only endpoint starts at $0.001 per minute of audio — cheaper than AWS Transcribe and more accurate, according to early adopters.
Developers can:
bash CopyEdit curl <https://api.mistral.ai/v1/audio/transcriptions> \\ -H "Authorization: Bearer $KEY" \\ -F file=@meeting.wav \\ -F model="voxtral-small"
Mistral also teased enterprise add-ons launching later this year:
A live demo webinar with Inworld (August 6) will showcase end-to-end speech-to-speech agents.
If you’re building voice bots, meeting assistants, or multilingual support tools, Voxtral is the first open model that matches premium closed APIs on accuracy while slashing costs.
And because it’s Apache 2.0, the only limit is your imagination.
Ready to test? Grab the model, spin up the API, or just open Le Chat and start talking.