Open in App
  • Local
  • U.S.
  • Election
  • Politics
  • Sports
  • Lifestyle
  • Education
  • Real Estate
  • Newsletter
  • Rest of World

    India’s star audio content company is going all in on AI. Will listeners tune in?

    By Ananya Bhattacharya,

    2024-07-29

    One of India’s leading homegrown audio-streaming platforms is banking on artificial intelligence to drive its future growth.

    On June 20, Pocket FM, an app that produces and streams fictional audio series, said it has partnered with ElevenLabs , a popular American text-to-speech AI startup, to convert scripts into audio. While previously scriptwriters had to hire voiceover artists and editors to create content for Pocket FM, the new integration allows them to convert their scripts into audio with one click, with an option to choose their preferred voice and background music.

    This is the first partnership ElevenLabs has announced with an Indian company.

    Echoing global concerns about the use of AI for creating art , some voice artists and digital technology experts in India have raised questions about the partnership. Some artists believe Pocket FM may have surreptitiously used their voice samples to train AI models. The two companies have denied the allegations.

    “We really don’t know what kind of infringed content we’re consuming on the platform,” digital analyst Ami Shah told Rest of World .

    Pocket FM was co-founded in 2018 when entrepreneur Rohan Nayak got together with Nishanth KS and Prateek Dixit to create and distribute fictional audio shows.

    Prior to the ElevenLabs partnership, the app had a library of more than 100,000 hours of content: mainly 10- to 15-minute episodes of over 2,000 audio shows, spanning genres like romance, thriller, and horror. Pocket FM’s two most popular shows, Insta Millionaire and Saving Nora , have more than 1,700 and 2,100 episodes, respectively.

    Pocket FM, which is available in 20 countries, had been downloaded at least 189 million times worldwide as of June 26, according to market insights platform Sensor Tower. The company clocked $150 million in annual recurring revenue in 2023 on the back of its micropayments model, where users pay as little as 1 rupee or 30 cents per episode instead of buying a subscription.

    “A user typically doesn’t want to spend a lot on a newer format,” Mukesh Kumar, associate partner at marketing and consulting firm RedSeer, told Rest of World . “This monetization model makes it very affordable for everyone.”

    The U.S. is Pocket FM’s second-largest market after India, Sensor Tower data shows, with 2.3 million users. It has an office in Los Angeles. The company gets over 85% of its revenue from the U.S., according to analytics platform AppMagic. On July 27, digital news publication The Morning Context reported that Pocket FM laid off more than 250 writers in the U.S.

    “This is incorrect,” Pocket FM said in a statement about the report. “We have recently had to part ways with some of our writers for U.S.-based audio series to align our resources with our current show pipeline. These changes are typical in the content creation industry.”

    “I think we underestimate the amount of effort and creativity needed to create a great story,” Nayak had said earlier in July about the prospect of AI replacing scriptwriters at Pocket FM. “I think it’s very, very hard. So I don’t see that at all.”

    Pocket FM has used AI in the past to improve audio quality, make personalized recommendations to listeners, and predict blockbusters. AI voice cloning is a natural next step to better efficiency and cost savings, Nayak told Rest of World . He said Pocket FM used AI cloning to produce another 5,000 shows during a six-month trial with ElevenLabs before the partnership was formally announced.

    The company now offers text-to-speech AI in English and German, and plans to add more languages as it expands across Europe and Latin America this year.

    “As we continue to scale, we may look at enabling voice AI across markets to enhance our efficiency and speed.”

    “As we continue to scale, we may look at enabling voice AI across markets to enhance our efficiency and speed,” Nayak said.

    The partnership with ElevenLabs will help convert scripts into high-quality audio programs at a quicker pace, “which will help writers start their monetization faster and also help Pocket FM to scale up faster in newer markets,” Harsha Kumar, partner at venture capital firm Lightspeed Ventures, a seed investor in Pocket FM, told Rest of World .

    But voice artists are not convinced.

    At the time of announcing the partnership, the two companies said they had produced over 30,000 hours of audio shows during an “experimental phase,” and slashed costs by 90%.

    The timing for this experiment overlapped with a contest that Pocket FM launched in India. Contestants were asked to record and send a 10-minute voice sample in English, Hindi, Tamil, or Telugu. The competition’s initial licensing clause gave Pocket FM “exclusive, royalty-free, perpetual, transferable, irrevocable and fully sublicensable” rights to use the samples, create derivative works from them, and distribute them.

    Surjan Singh, a voice artist with over 20 years of experience, called out Pocket FM for the unfettered access.

    “Although it might be a great opportunity for beginners but please be mindful of the fact that it asks for a 10 minute sample … [whereby] your voice can be cloned and created into many hours of content without your knowledge,” Singh commented on the company’s LinkedIn post announcing the contest. “AI replica is a real threat for voice actors.”

    "AI replica is a real threat for voice actors."

    Singh told Rest of World that he feels Pocket FM used the competition as “a very disguised mechanism” to collect samples for AI voice cloning for its project with ElevenLabs.

    Another voice artist, Manohar Rao, told Rest of World he submitted over two dozen 10-minute samples for the competition without hearing back. “Certainly somewhere I have a fear that my samples in different genres and accents and emotions have already gone to them,” Rao said . Since these are samples, he added, he has no copyright claim on them. Pocket FM announced the winners of the competition in June — the company claims to have selected more than 40% of the 200 entries received last quarter to voice the audio shows on its platform.

    Pocket FM, which subsequently removed the controversial clause, denies these allegations. “It’s [ElevenLabs’] voices that they train on their proprietary data. It’s nothing to do with us effectively right now,” Nayak said. “We take user data very seriously. We haven’t shared any shred of data with ElevenLabs.”

    Pocket FM now puts a disclaimer on its campaigns, stating all samples are for reference only and that they will not be reproduced or used for commercial purposes, the company told Rest of World . It has also started experimenting with labeling AI-generated audio content, Nayak said, without sharing further details.

    Sam Sklar, head of growth at ElevenLabs, told Rest of World that Pocket FM did not submit any voice samples or data for this project. He said the samples are from ElevenLabs’ voice library, which has thousands of community-submitted voices suitable for text-to-speech in 29 languages.

    “Artists need to understand the agreements which they sign with organizations in terms of scope of use of voice content — directly or indirectly,” Jyoti Joshi, founder and CEO of deep-tech startup Kroop AI, told Rest of World . “As far as I know, voice data can be purchased for training purposes with the due permission of the artists. The industry works well within the stated agreements and implementing guidelines to ensure responsible use of AI." ▰


    Ananya Bhattacharya is a Rest of World reporter based in Mumbai, India.

    Expand All
    Comments / 0
    Add a Comment
    YOU MAY ALSO LIKE
    Most Popular newsMost Popular
    West Texas Livestock Growers1 day ago

    Comments / 0