I cloned my voice with AI and even my wife can’t tell the difference

Listening to your own voice saying words you’ve never said before is an unsettling experience, but in the AI future which we’re living through right now in 2024, it’s almost unsurprising. Of course, AI can now clone your voice and make it sound just like you! It’s almost expected, isn’t it?

What is surprising, to me at least, is how easy it is to do. You can access an AI voice cloner for free online, and clone your voice, then get it to say anything you want in just a few minutes. The training takes just 30 seconds, then you’re good to go. There are no real security checks or restrictions on what you can do with that voice once you’ve trained it either. So, you could make it swear, or threaten somebody. There seem to be hardly any guardrails.

Who’s that voice?

If you type in ‘AI Voice Cloner’ into a Google search bar you’ll be spoiled for choice. A lot of the voice cloners require you to sign up for a monthly fee before they will clone your voice, but quite a few of them have a free option. I tried a few of the free choices and some of them, despite promising unparalleled accuracy, produced a robotic version of my voice that was going to fool nobody. No, I had a higher goal in mind: I wanted to produce a clone of my voice that would fool my wife.

I eventually settled on Speechify to clone my voice, since it combined ease of use, full access to the voice cloner, and a 30-second training time. Once you’ve made a free account on Speechify you simply talk to your microphone for 30 seconds or longer to train your AI voice. Once you’ve done that you can type in some text and hit the Generate button to hear the words spoken back to you in your own voice.

If you’re concerned about security, Speechify has a pretty detailed privacy statement, and it does say that it will never sell your information and is committed to protecting the privacy of your data. So, your uploaded voice should be for only you to use.

I thought what I created was pretty convincing, but I needed to see what my wife thought. I crept up behind her and played a sample clip of ‘me’ and… well ok, she laughed because she could tell it was coming out of my MacBook’s speakers, but she was impressed. “Actually”, she said, “I think it sounds like you, but better”.

And that is the benefit of cloning your voice. It doesn’t make mistakes when it talks. There are no ‘ums’ and ‘ahs’ and it gets everything right the first time. If I think about how many times I’ve had to record and re-record the intros to my podcasts because I couldn’t get it quite right, I can see an obvious application for an AI voice cloner. But that’s also a danger in AI voice cloning because you can get the fake voice to say just about anything.

Daisy, the AI granny

Daisy, the AI granny, was an AI voice created to trap scammers into long and fruitless phone calls. (Image credit: O2 Virgin Media)

Voices from the beyond

While scams that involve stealing your voice are one level of concern, the security implications have ramifications that go even beyond the grave. Recently the legendary late British talk show host, Michael Parkinson, surprised everybody by announcing that he was launching a new podcast called Virtually Parkinson. Thanks to the miracles of AI his voice would be interviewing people in real time once again. In Parkinson’s case, his estate is fully behind the podcast, but what if permission has not been given?

David Attenborough, the grandfather of the BBC’s natural history programming recently expressed unease at an AI version of his voice, describing it as “disturbing“. We live in an age where AI can create podcasts without any human interaction and even AI sports presenters are starting to appear. So, in a way, we shouldn’t be surprised that it’s so easy for AI to clone our voices, but the implications could be profound.

With AI giving celebrities (or rather, their estates) the option to continue working long after they have shuffled off this mortal coil, the future for both celebrities and individuals suddenly seems very uncertain.

You might also like…