🎉 Unlock the Power of AI for Everyday Efficiency with ChatGPT for just $29 - limited time only! Go to the course page, enrol and use code for discount!

Write For Us

We Are Constantly Looking For Writers And Contributors To Help Us Create Great Content For Our Blog Visitors.

Contribute
How Human Can AI Sound? Try Sesame AI’s Conversational Speech Model Live
Technology News, General

How Human Can AI Sound? Try Sesame AI’s Conversational Speech Model Live


Mar 03, 2025    |    0

Sesame AI has made a significant leap in conversational voice technology, introducing a system that aims to cross the "uncanny valley" of AI-human interaction. On February 27, 2025, the company revealed its Conversational Speech Model (CSM), a breakthrough that promises to revolutionize how we interact with AI assistants.

Sesame AI Showcase

Experience Sesame AI Voice Technology

Revolutionizing human-AI interaction with breakthrough technology that crosses the "uncanny valley" of conversation with emotional intelligence, natural timing, and authentic personality.

 

Revolutionary Features

Emotional Intelligence
Adapts to emotional context of conversations, recognizing and responding to subtle cues in human speech.
Detects emotional states from voice tone and word choice
Adjusts responses based on user's emotional state
Builds emotional memory throughout conversations
Conversational Dynamics
Natural timing, pauses, and emphasis create authentic dialogue flow that mirrors human conversation patterns.
Incorporates natural pauses and fillers like "um" and "hmm"
Variable response timing based on complexity of questions
Appropriate interruption and turn-taking capabilities
Contextual Awareness
Adapts tone and style to match different situations, ensuring appropriate responses across contexts.
Recognizes formal vs. casual conversation settings
Adapts vocabulary and speech patterns to match context
Maintains appropriate professional boundaries
Low Latency
200ms generation time enables real-time interactions that feel responsive and natural during conversation.
Near-instantaneous response initiation
Eliminates awkward pauses in conversation flow
Enables real-time speech correction and adaptation

Voice Technology Comparison

Compare AI Voice Technologies
Traditional AI
Standard voice assistants
Sesame CSM
Conversational Speech Model
Human Speech
Natural conversation
Natural Pauses & Timing
9.5/10
 
 
Emotional Intelligence
8.7/10
 
 
Contextual Adaptation
9.2/10
 
 
Voice Presence
9.8/10
 
 

The Quest for "Voice Presence"

At the heart of Sesame's innovation is the concept of "voice presence" – the ability of AI to engage in genuine dialogue that builds trust and understanding over time. The CSM achieves this through:

  • Emotional intelligence: Adapting to the emotional context of conversations
  • Conversational dynamics: Incorporating natural timing, pauses, and emphasis
  • Contextual awareness: Adjusting tone and style to match the situation
  • Consistent personality: Maintaining a coherent and appropriate presence

Try to Talk to Sesame AI Now!

Demo Powered by Sesame AI

Conversational Voice Technology Demo

Experience the future of natural voice interaction, where conversations feel genuinely human. Press the call button below to start a conversation with either Maya or Miles, Sesame's advanced AI voice assistants.

 
Loading Voice Demo...
 

The above demo is embedded directly from Sesame.com and is not hosted on this website

Microphone permission required. Calls are recorded for quality improvement but not used for ML training.

This demo is provided by Sesame AI. To experience the full demo in its original context, visit Sesame's website directly.

Technical Innovations

The CSM operates as a single-stage, multimodal learning system that combines text and audio processing. Key features include:

  • Low-latency generation (200ms) for real-time interactions
  • Pronunciation correction and homograph disambiguation
  • Use of semantic and acoustic tokens for high-fidelity audio reconstruction

Public Demo and Reception

Sesame's research preview, featuring AI companions Maya and Miles, has garnered significant attention:

  • The demo showcases human-like quirks, including filler words and contextual preferences
  • Social media reactions have been overwhelmingly positive, with industry leaders praising the technology
  • Journalists reported interactions so lifelike that bystanders mistook the AI for human conversation partners

Shopify CEO Tobi Lutke called the demo "absolutely insane," while Vercel CEO Guillermo Rauch described it as "astonishing."

 

Voice AI Comparison

Experience the difference between traditional AI voice systems and Sesame's revolutionary Conversational Speech Model (CSM)

Select a Conversation Scenario
Traditional Voice AI
Standard robotic responses
 
 
 
Sesame CSM
Human-like conversation
 
 
 

Future Plans and Industry Impact

Sesame's ambitions extend beyond software:

  1. Development of AI eyewear for all-day wearable audio interaction
  2. Expansion of language support to over 20 languages
  3. Creation of duplex models for improved conversational flow
  4. Open-sourcing of key components under Apache 2.0 license

Founded by Oculus VR co-creator Brendan Iribe and speech technology expert Ankit Kumar, Sesame has secured Series A funding from Andreessen Horowitz and established offices in San Francisco, New York, and Bellevue.

Challenges and Ongoing Development

While the CSM has shown impressive results, Sesame acknowledges that challenges remain:

  • Fully replicating human-like prosody in extended conversations
  • Scaling up the model and dataset
  • Developing truly duplex models for natural turn-taking

As the race for audio-first computing heats up, Sesame's innovations position it at the forefront of a potential paradigm shift in human-computer interaction. With its combination of technical prowess and visionary leadership, Sesame is poised to redefine our relationship with AI assistants, potentially making screen-based interfaces a thing of the past.