ChatGPT, Dall-E, generative AI, machine learning, neural networks… You must have heard or read about these buzzwords in the news recently. The advancements in artificial intelligence (AI) are reimagining the world as we know it. These technological advancements are seeping into different fields, and the world of podcasts is no exception.
Recently, there has been a lot of buzz about The Joe Rogan AI experience, which recreates The Joe Rogan Experience podcast using ChatGPT and a voice cloning software. I have to admit I was highly sceptical about the AI-generated parody when I tuned in to hear episode 1 featuring the (fictional) Sam Altman, CEO of Open AI. I expected to hear a jarring robotic voice rambling gibberish. However, the episode was quite realistic and mimicked an actual podcast episode for the most part. The speech patterns of Rogan and Altman were replicated to an extent. There were instances of monotonous and stilted verbose speech, which served as a reminder that this was not the real deal.
AI-generated podcasts can’t compete with podcasts that thrive on the intimacy of conversation
The spontaneity of conversation and human quirks that attract most podcast listeners was lacking in the AI-generated parody. The creator of the podcast, Hugo recognised this fatal flaw and regarded the podcast as an experiment showcasing the advancement of AI voice tools. Beyond that, he called it “wasted time”. The first episode of the AI-generated parody gained popularity as the (real) Joe Rogan tweeted about it, but viewership declined for subsequent episodes. Critics of AI-generated podcasts have accurately noted that AI-generated podcasts can’t compete with podcasts that thrive on the intimacy of conversation.
While AI-generated podcasts haven’t advanced to the stage where they can stimulate witty human conversation, there might be potential for using AI for podcasts that provide more generic content. An example is the AI-generated Hacker News Recap by WondercraftAI, which provides bite-sized summaries of Hacker News. In a bid to grab the attention of the audience, creators leverage technology to establish their own niche on the internet. The advancement of AI, coupled with the booming creative economy where creators are finding new ways to monetise their content, has led to the use of AI editing tools for podcasts and voice cloning software for advertisements.
The software generated a human voice that enunciated the text clearly but somehow didn’t quite sound like the voice of a young Indian female
In the spirit of experimentation for this article, I decided to try cloning my voice using AI to ascertain how accurate and easy it might be for podcast creators to clone voices. My first few attempts failed. I signed up for a free account on Speechify and recorded a voice sample, but I couldn’t replicate my voice as the website displayed error messages. My second attempt involved using Resemble AI, which initially advertised a free basic version but put its services behind a paywall as I tried to proceed after signing up. I then tried using ElevenLabs. After signing up for a free account, I realised that the voice cloning services were only available for premium users. Therefore, I decided to play around with their Voice Design feature instead, which allows users to select specific attributes and randomly generate voices to read out text. To generate a voice as close to my own as possible, I selected female, young and Indian accent for the gender, age and accent parameters. I modified the accent strength during the multiple iterations. In my opinion, the software generated a human voice that enunciated the text clearly but somehow didn’t quite sound like the voice of a young Indian female and was certainly not similar to my voice.
AI-generated podcasts, like AI-generated artwork, open a Pandora’s box of questions regarding AI regulation and copyright issues. Who owns the Joe Rogan AI experience? Is it the “creator” of the parody, Hugo, the companies that own the AI tools used to create the podcast or the owners of the data on which these AI tools have been trained upon? In the words of Joe Rogan, “This is going to get very slippery, kids.”
I might have just signed up to be spammed with an eternity of marketing emails and potential misuse of my voice sample
Beyond copyright issues, there are concerns related to the privacy of data and malicious use of AI. I am not quite sure how the multiple voice software that I experimented with would use the data I provided when I signed up for their services. The terms and conditions for using these software were lengthy, injected with jargon and difficult to understand. I am not sure if my voice sample on Speechify will be stored and if it will be protected. I might have just signed up to be spammed with an eternity of marketing emails and potential misuse of my voice sample, which is disturbing. AI-powered tools can be leveraged to impersonate celebrities, as seen with deepfakes, and spread misinformation. ElevenLabs has been used to make deepfake voices of celebrities like Rogan and Emma Watson to voice out inappropriate, racist and transphobic opinions in the past. Given these concerns, the regulatory landscape for AI implementation needs to adapt quickly. It will be interesting to see if the proposed European Union Artificial Intelligence Act would be able to tackle the ethical implications of AI implementation and curb its misuse in various fields, including content creation.
AI-generated podcasts are currently in their nascent stage. The trajectory of AI use for podcasts will depend on the development of AI regulation in conjunction with the advancing technology and changing creator economy. Buckle up – we are in for an exciting ride for the future of podcasts.