Support the Timberjay by making a donation.

Serving Northern St. Louis County, Minnesota

The rise of the digital voice clones – are you next?

David Colburn
Posted 2/1/24

Instead of having to read his insightful articles, how would you like to have Marshall Helmberger read them to you? To make that happen now, Marshall would have to record himself reading each article …

This item is available in full to subscribers.

Please log in to continue

Log in

The rise of the digital voice clones – are you next?

Posted

Instead of having to read his insightful articles, how would you like to have Marshall Helmberger read them to you?
To make that happen now, Marshall would have to record himself reading each article and we’d have to post the audio files to each article on the website, a rather time-consuming task that isn’t ever going to happen given Marshall’s incredibly busy schedule.
But what if I told you that we could make it happen without Marshall ever having to press a record button? With the latest in artificial intelligence voice cloning, it’s possible.
In case you weren’t aware, Marshall does a weekly radio interview recapping our top stories on KAXE radio (91.7 FM), most of which are archived on the radio’s website. Those recordings give me the necessary sample of Marshall’s voice to create a clone, something easily done by going to one of several AI voice cloning websites available.
Once the site has the audio sample, it uses AI to analyze it and create a “voice” based on the tonal qualities it “hears,” and when you type in a text-based message it will create an audio recording of that person’s voice reading what you typed. In Marshall’s case, I could create his voice, cut and paste one of his articles into the text field, and voila! Marshall is reading his story to you in a voice you’d swear on a stack of Bibles was his own and not an AI-generated fake.
I tried it on one of the sites using my own voice, creating the clone and pasting in a portion of the North Woods – Cherry basketball game, and immediately noticed one glitch. While it sounded exactly like me, it didn’t know how to pronounce the last name of Isaac Asuma, saying “un-SOOM-uh” instead of “AH-soom-uh.” But the vocal quality was spot on – sounded just like me reading the story.
One of the best uses for this amazing new tech would be for people who have lost the ability to speak. If there’s a recording of their voice, it could be made into a digital voice clone. Sen. David Tomassoni was having a digital voice clone developed for himself when his ALS made it nearly impossible to speak, but the tech has advanced exponentially since then. Combined with a computer and video system that lets an ALS patient type by looking at a keyboard, that person could carry on conversations using their own still fluent, clear digital voice clone.
For digital media content creators, digital voice clones can speed up the creative process by allowing input of scripts for video voice-overs. Coupling a digital voice clone with a generative AI chatbot can turn a company’s phone receptionist into an interactive customer service rep that sounds exactly like that person. Actors who do commercial voice-over work could use digital voice clones to get additional work. Creating audiobooks would be a breeze, requiring 30-seconds of recording for a voice clone rather than the author having to read the entire book. And in a nod to our opening example, digital voice clone newsreaders could function 24 hours a day, constantly refreshed with AI-generated content. The possibilities are extensive and astounding.
But as we’ve already seen, there’s a dark side to this technology as well. In last week’s New Hampshire presidential primary, a digital voice clone of Joe Biden was used for robocalls telling Democrats not to vote. Getting a politician’s or celebrity’s voice to clone is simple, and there are lots of sites and apps “for fun” that let you create audio and video files putting any words you want in their mouths.
Scarier still are the crimes being committed by using digital voice clones. People who have posted videos of themselves online talking have had their voices cloned by unscrupulous scammers who have used those voices to try to authorize illegal financial transactions. They’ve created “family scare” scams in which a person receives a phone call supposedly from a loved one who is experiencing a crisis and needs money immediately. In one such scenario, the cloned voice says hello and a few words and then another person comes on the phone claiming to have kidnapped them and demands ransom be sent through a cash payment app. Such scams have become prevalent enough that the Federal Trade Commission released a warning about them last spring.
The best way to avoid any such scams is to not post any clips of yourself or others speaking to social media sites. At the very least, use privacy controls on social media to restrict access to your media files, and don’t accept friend requests from people you don’t know. Assume that anything you post online can be found by anyone unless you’ve taken specific steps to keep it hidden and secure.
Of course, equally scary is the pairing of digital voice clones with AI generated video of someone, video with lips moving in perfect sync to the words that it was given. I created one of those for Joe Biden just the other day, warning of the dangers of bananas. There are numerous samples on YouTube – Biden giving advice on how to get a goth girlfriend and telling the story of the magic pistachio are two unusual ones that demonstrate the tech without fear of accidentally creating an international crisis. But those could be coming, much more easily and faster than we’re ready for.