Voxtral Transcribes at the Speed of Sound: Introducing Voxtral Transcribe 2
We're thrilled to announce the release of Voxtral Transcribe 2, a groundbreaking advancement in speech-to-text technology. This cutting-edge solution is designed to revolutionize the way we interact with voice data, offering unparalleled accuracy, efficiency, and versatility.
Voxtral Transcribe 2: The Ultimate Transcription Suite
Voxtral Transcribe 2 comprises two powerful models: Voxtral Mini Transcribe V2 and Voxtral Realtime.
- Voxtral Mini Transcribe V2: This model excels in batch transcription tasks, boasting state-of-the-art accuracy with speaker diarization, context biasing, and word-level timestamps in 13 languages. It's the perfect choice for meeting transcription, interview analysis, and multi-party call processing.
- Voxtral Realtime: Purpose-built for real-time applications, Voxtral Realtime offers ultra-low latency, configurable down to sub-200ms. It's ideal for voice agents and near-offline accuracy, ensuring seamless interactions in various scenarios.
Key Features at a Glance:
- Best-in-Class Efficiency: Voxtral Mini Transcribe V2 sets new standards with the lowest word error rate and the most competitive price point in the industry.
- Open Weights: Voxtral Realtime is released under the Apache 2.0 license, allowing for easy deployment on edge devices for privacy-first applications.
- Multilingual Excellence: Both models support 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch, with Voxtral Realtime excelling in real-time transcription.
- Noise Robustness: Voxtral Transcribe 2 maintains accuracy in challenging acoustic environments, ensuring reliable transcription in various settings.
- Longer Audio Support: Process recordings up to 3 hours in a single request, making it suitable for extensive audio projects.
Voxtral Realtime: Real-Time Transcription Mastery
Voxtral Realtime is designed for applications where speed is critical. Its streaming architecture transcribes audio as it arrives, achieving a delay of just 2.4 seconds, ideal for subtitling. At 480ms, it rivals batch models like Voxtral Mini Transcribe V2 in accuracy, enabling voice agents with near-offline precision.
Voxtral Mini Transcribe V2: Transcription Excellence
This model delivers exceptional performance with an average diarization error rate of just 4% across five English benchmarks and the TalkBank multilingual benchmark. It outperforms competitors in accuracy and processing speed, making it the go-to choice for high-quality transcription.
Transforming Voice Applications
Voxtral Transcribe 2 empowers a wide range of voice applications across industries:
- Meeting Intelligence: Transcribe multilingual recordings with precise speaker attribution, enabling efficient meeting content annotation.
- Voice Agents and Virtual Assistants: Build natural-sounding conversational AI with sub-200ms transcription latency.
- Contact Center Automation: Real-time transcription enhances AI systems, sentiment analysis, and CRM field population during calls.
- Media and Broadcast: Generate live subtitles with minimal latency, handling technical terms and proper nouns effortlessly.
- Compliance and Documentation: Monitor and transcribe interactions for regulatory compliance, ensuring clear speaker attribution and precise audit trails.
Get Started with Voxtral Transcribe 2
Voxtral Mini Transcribe V2 is available via API at $0.003 per minute, and Voxtral Realtime is accessible at $0.006 per minute. Explore the models in the Mistral Studio audio playground or Le Chat for hands-on experience. For more information, visit the documentation and join our team to contribute to the future of speech AI.