This new AI can simulate your voice with just 3 seconds of sound

The new language model from Microsoft Vall-E It is said to be able to mimic any voice using only a three-second recording sample.

The recently released AI tool was tested on 60,000 hours of English speech data. It can replicate the emotions and tone of a speaker, researchers said in a Cornell University paper.

These results seemed to be true even when a recording of words never said by a native speaker was created.

“Vall-E highlights learning capabilities in context and can be used to synthesize personalized, high-quality speech using it Recording recorded for only 3 seconds From the invisible speaker as a voice prompt. The results of the experiment show that the Vall-E is significantly superior to the latest zero-shot [text to speech] system in terms of naturalness of speech and similarity of the speaker,” the authors wrote. In addition, we find that Vall-E can keep the speaker’s emotion and the acoustic environment of the soundboard in tuning. “

ANDROID SPY SOFTWARE STRIKES AGAIN TO TAKE FINANCIAL INSTITUTIONS AND YOUR MONEY

Microsoft corporate booth signage is displayed at CES 2023 at the Las Vegas Convention Center on January 6, 2023, in Las Vegas, Nevada.
((Photo by David Baker/Getty Images))

Val-E samples Shared on GitHub are eerily similar to the speaker claims, though they range in quality.

In one of the compound sentences from the Emotional Voices Database, Val-E calmly says the sentence: “We have to reduce the number of plastic bags.”

Disney Characters Coming to Amazon Alexa with the “Hey Disney” Command

However, the search in Text-to-speech AI It comes with a warning.

“Since Vall-E can synthesize speech that preserves the speaker’s identity, it might as well Possible risk of misuse of the form, such as impersonating the identification of a voice or impersonating a specific speaker,” the researchers on this webpage say. We run the experiments assuming that the user agrees to be the target speaker in the speech synthesis. When the model is generalized to unseen speakers in the real world, it must include a protocol to ensure the speaker consents to the use of their voice and the synthesized speech detection model. “

Microsoft Corp. signs at the Microsoft India Development Center, in Noida, India, on Friday, November 11, 2022.

Microsoft Corp. signs at the Microsoft India Development Center, in Noida, India, on Friday, November 11, 2022.
(Photographer: Prakash Singh/Bloomberg via Getty Images)

Click here for the FOX NEWS app

Currently, Vall-E, which Microsoft calls a “neural markup language paradigm,” is not available to the public.

Leave a Reply

Your email address will not be published.