Amazon unveils a brand new AI voice mannequin, Nova Sonic | TechCrunch


On Tuesday, Amazon debuted a brand new generative AI mannequin, Nova Sonic, able to natively processing voice and producing natural-sounding speech. Amazon claims that Sonic’s efficiency is aggressive with frontier voice fashions from OpenAI and Google on benchmarks measuring velocity, speech recognition, and conversational high quality.

Nova Sonic is Amazon’s reply to newer AI voice fashions such because the mannequin powering ChatGPT’s Voice Mode, which really feel extra pure to talk with than the extra inflexible fashions from Amazon Alexa’s early days. Latest technological breakthroughs have made legacy fashions and the digital assistants they underpin, equivalent to Alexa and Apple’s Siri, appear extremely stilted by comparability.

Nova Sonic is accessible by way of Bedrock, Amazon’s developer platform for constructing enterprise AI functions, by way of a brand new bi-directional streaming API. In a press launch, Amazon referred to as Nova Sonic “probably the most cost-efficient” AI voice mannequin available on the market, and round 80% cheaper than OpenAI’s GPT-4o.

Elements of Nova Sonic are already powering Alexa+, Amazon’s upgraded digital voice assistant, based on Amazon SVP and Head Scientist of AGI Rohit Prasad.

In an interview, Prasad informed TechCrunch that Nova Sonic builds on Amazon’s experience in “giant orchestration programs,” the technical scaffolding that makes up Alexa. In comparison with rival AI voice fashions, Nova Sonic excels at routing person requests to completely different APIs, mentioned Prasad. This functionality helps Nova Sonic “know” when it must fetch real-time info from the web, parse a proprietary knowledge supply, or take motion in an exterior utility — and use the suitable software to do it.

Throughout a two-way dialogue, Nova Sonic waits to talk “on the acceptable time,” considering a speaker’s pauses and interruptions, says Amazon. It additionally generates a textual content transcript for the person’s speech, which builders can use for numerous functions.

Nova Sonic is much less liable to speech recognition errors than different AI voice fashions, based on Prasad, which means the mannequin is comparatively good at understanding a person’s intent even when they mumble, misspeak, or are in a loud setting. On a benchmark measuring speech recognition throughout languages and dialects, Multilingual LibriSpeech, Amazon says Nova Sonic achieved a phrase error charge (WER) of simply 4.2% when averaged throughout English, French, Italian, German, and Spanish. Meaning that roughly 4 out of each 100 phrases from the mannequin differed from a human transcription in these languages.

On one other benchmark measuring loud interactions with a number of individuals, Augmented Multi Occasion Interplay, Amazon says Nova Sonic was 46.7% extra correct when it comes to WER than OpenAI’s GPT-4o-transcribe mannequin. Nova Sonic additionally has industry-leading velocity, with a mean perceived latency of 1.09 seconds, based on Amazon. That makes it quicker than the GPT-4o mannequin powering OpenAI’s Realtime API, which responds in 1.18 seconds, per benchmarking by Synthetic Evaluation.

Prasad says Nova Sonic is part of Amazon’s broader technique to construct AGI (synthetic normal intelligence), which the corporate defines as “AI programs that may do something a human can do on a pc.” Shifting ahead, Prasad says Amazon plans to launch extra AI fashions that may perceive completely different modalities, together with picture, video, and voice, in addition to “different sensory knowledge which can be related if you happen to convey issues into the bodily world.”

Amazon’s AGI division, which Prasad oversees, appears to be enjoying a bigger position within the firm’s product technique lately. Simply final week, Amazon launched a preview of Nova Act, a browser-using AI mannequin that seems to be powering parts of Alexa+ and Amazon’s Purchase for Me function. Beginning with Nova Sonic, Prasad says the corporate needs to supply extra of its inside AI fashions for builders to construct with.

Leave a Reply

Your email address will not be published. Required fields are marked *