What AI Voice Generation Means for Humans


Chinese tech giant Baidu recently announced an impressive, albeit alarming, technology advancement that their artificial intelligence made an audio for just 3.7 seconds / Photo by: Shwangtianyuan via Wikimedia Commons


Chinese tech giant Baidu recently announced an impressive, albeit alarming, technology advancement: it only takes 3.7 seconds of audio for artificial intelligence to clone a voice.

The relatively short time is major progress for the company's voice cloning tool, called Deep Voice, which previously needed 30 minutes of audio to do the same. Such an advancement shows the rapid rate at which technology can create artificial voices.

"In just a short time, the capabilities of AI voice generation have expanded," wrote Bernard Marr, strategic business and technology advisor, on Forbes. He added such capabilities are becoming more realistic, which makes technology easier to misuse.

Aside from the Deep Voice progress, capabilities of AI voice generation also include changing the gender of the voice as well as switch between accents and styles of speech.

Google's text-to-speech system, Tacotron 2, makes it almost impossible to differentiate between AI-generated and human-generated voice. The Alphabet-owned company also integrated the voice of singer John Legend on any US-based device with Google Assistant and will respond to questions like "What's the weather?" and "How far away is the moon?"

"This advanced technology opens the door for companies such as Lyrebird to provide new services and products," Marr wrote.

Lyrebird employs the use of AI to create voices for chatbots, audiobooks, video games, text readers, and more. On their website, the company acknowledged that "with greater innovation comes with greater responsibility" and emphasized the importance of technology leaders to be careful in avoiding the misuse of the advancements.

According to Marr, more opportunities to exploit AI algorithms to mislead people begin to emerge as it gets better and it becomes harder to determine what's real and what's artificial.

"Now that these AI systems only require a short amount of audio to train in order to create a viable artificial voice that mimics the speaking style and tone of an individual, the opportunity for abuse increases," the tech advisor said.