OpenAI has released a major upgrade to its audio technology. The company has introduced new speech-to-text(converting voice to written words) and text-to-speech (converting written words to voice) models. These tools are now available to developers around the world.
These upgrades will make voice assistants, customer service bots, and creative applications more realistic and useful.
The new GPT-4o-Transcribe model understands spoken words more accurately than before. This is especially helpful in:
Tests show these models make up to 20% fewer mistakes than previous versions (like Whisper). They work well in over 100 languages, including English, Spanish, Hindi, and Korean.
This makes them perfect for accurately transcribing:
For the first time, developers can tell AI voices how to speak with different emotions and styles. The new GPT-4o-Mini-TTS model can make voices sound:
While limited to pre-created synthetic voices for safety reasons, this feature enables:
Call centers can now use AI that understands almost every word, even with background noise, and responds in a natural, appropriate tone.
People with hearing impairments or language barriers can benefit from more accurate transcriptions of spoken content.
Writers and game developers can create expressive AI voiceovers that match their characters' personalities or the mood of different scenes.
OpenAI trained these models using large collections of real-world audio recordings. They used reinforcement learning(a type of AI training method) to reduce errors.
Smaller versions of these models (like GPT-4o-Mini) maintain good quality while using less computing power, making them more affordable for app developers.