Called VALL-E, a new model of artificial intelligence is capable of replicating your voice after spending just three seconds listening to it. A promising and worrying advance that currently only works in English.

After ChatGPT for the generation of texts, Midjourney for the composition of tables and illustrations, or even MusicLM for music creation, a new model of artificial intelligence attacks the voice, and not just any voice: yours. Designed by Microsoft and first mentioned in January, VALL-E can create audio messages that reproduce the sound of your voice.
As our colleagues from numeral, VALL-E is based on the concept of text-to-speech synthesis or TTS. In other words, it is capable of pronouncing the text you choose based on a written text on the one hand, and on the other on the recording of a voice, yours in this case, reciting any other text. The VALL-E’s main advantage, however, is the listening time needed to replicate your voice: just 3 seconds, instead of the minimum 60 seconds required by competing technology, introduced by Amazon last year.
Wall-E currently only works with the English language
Since the VALL-E announcement in early January, researchers have been able to perform numerous tests, both qualitative and quantitative. numeral, and these turn out to be conclusive. Microsoft’s AI has advanced enough to outperform current models in terms of realism. Understand that this new AI is capable of imitating your voice with great realism… and without having a robotic pronunciation.

To achieve this result, VALL-E relies on a dense learning corpus made up of 60,000 hours of recordings in English with 7,000 different people, we learn. However, there are two shadows on the board: Microsoft’s AI is currently limited to English only (indeed, its learning corpus has only registered with English speakers), and it probably isn’t very comfortable with accents (many in the English language) with which it has not yet been confronted.
Beware of the risk of embezzlement…
Anyway, and if you still need to train a bit, VALL-E could soon be used for many different applications, in particular « for the simplification of production, or the reduction of costs”, underlines Louis-François Bouchard, a doctoral student at the Quebec Institute of Artificial Intelligence interviewed by numeral. However, we must be realistic, this new AI model will also be used to profoundly false voices… and the diversions that accompany it.
«It is a tool that can be useful and used in a totally legal way. But it can also be misused. It all depends on the hands it’ll be in“, Louis-François Bouchard also agrees. “I think in the future we will have to be very attentive to what we see and hear online.“, he added.
A problem that Microsoft is aware of continues to be knowing how the firm is preparing to face it… and in this case, the group’s current policy seems more oriented towards the speed of bringing its various AIs to market than to the ethics that it should go with it. Proof of this is that the firm recently fired a team specializing in precisely this problem.
Our colleagues from Numerama launch Watt Else, their newsletter dedicated to the mobility of the future. Sign up here to make sure you receive the next issue!