The end of 2022 came on the heels of AI technologies experiencing widespread adoption due to the stunning popularity of OpenAI and ChatGPT. For the first time, AI achieved mass-market appeal by proving its utility and value in creating successful business outcomes.
Many AI technologies that seem like a revolution to everyday people in 2023 have actually been in active use by big businesses and media for several years. Join me as I take a closer look at the technology powering these solutions, in particular, generative AI systems for voice cloning, its business benefits, and ethical approaches to using AI.
How does voice cloning work?
In short, voice cloning enables one person to speak using the voice of another person.
It uses generative AI technology to create recordings of a person’s voice and use them to generate new audio content with that same person’s voice. It essentially allows people to hear what someone would have said, even if they didn’t say it themselves.
On the technical side, things don’t appear to be very complicated. But if you dive a little deeper, there are some minimum requirements to get started:
- You need at least 5 minutes of high-quality recorded audio of the source voice to clone it. These recordings should be clear and free of background noise or other distortions, as any imperfections could affect the accuracy of the model’s output.
- After that, feed these recordings into a generative AI model to create a “voice avatar.”
- Then, train the model to accurately reproduce speech patterns in pitch and timing.
- Once completed, this trained model can generate unlimited content using the source voice of any other person, becoming an effective tool for creating realistic-sounding replica voices.
This is the point at which many raise ethical concerns. What happens when we can insert any text into another person’s mouth and it’s impossible to tell if those words are real or fake?
Yes, this possibility has long since become a reality. As in the case of OpenAI and ChatGPT, we are currently facing a number of ethical issues that cannot be ignored.
Ethical standards in AI
As with many other novel technologies in their initial steps of adoption, the main threat is creating a negative stigma around the technology rather than acknowledging the threats as a source for discussion and valuable knowledge. What is important is to expose the methods that bad actors use to abuse the technology and its products, apply mitigation tools, and keep learning.
Today we have three layers of frameworks for ethical standards for generative AI. The national and supranational regulatory layer is in its initial stage of development. Policy world may not keep up with the speed of the emerging technology development, but we already see the lead of the EU with the EU Proposal on AI Regulation and The 2022 Code of Practice on Disinformation that outlines the expectations for big tech companies to tackle the dissemination if malicious AI manipulated content. On national levels we see US and UK regulatory first steps in addressing the issue with the US National Deepfake and Digital Provenance Task Force and UK Online Safety Bill.
The tech industry layer is moving faster as companies and technologists accept new reality when it comes to emerging technologies and their impact on societal security and privacy. The dialogue on ethics of generative AI is vibrant that grew into industry initiatives on developing Codes of conduct on generative AI (i.e. Partnership on AI Synthetic Media Code of Conduct) and ethical statements by the companies. The question is how to make the conducts practical and whether they reach the products in specific features and procedures in the teams.
Having worked on this problem with a number of different media and entertainment, cybersecurity and AI Ethics communities, I have formulated a few practical principles for dealing with AI content and voices in particular:
- IP owners and the company that uses the cloned voice can avoid many of the potential complications associated with using original voices by signing legal agreements.
- Project owners should publicly disclose the use of a cloned voice so that listeners will not be misled.
- Companies working on AI technology for voice should allocate a percentage of resources to developing technology that is capable of detecting and identifying AI-generated content.
- Labeling AI-generated content with watermarks enables voice authentication.
- Each AI service provider should review each project of its impact (societal, business and privacy levels) before agreeing to work on it.
Of course, the principles of ethics in AI won’t affect the spread of homemade deep fakes online. However, they will push any projects in the gray out of reach of the public market.
In 2021-22, AI voices were used in different mainstream projects that introduced hefty implications for ethics and society. These included cloning young Luke Skywalker’s voice for the Mandalorian series, Atreus’ voice for God of War 2, and Richard Nixon’s voice for the historic ‘In Event of Moon Disaster’.
Confidence in technology is growing beyond media and entertainment. Traditional businesses across many industries are using cloned voices in their projects. Here are a few of the most prominent use cases.
Industry use cases
In 2023, voice cloning will continue its rise alongside various businesses set to reap its numerous benefits. From healthcare and marketing to customer service and the advertising industry, voice cloning is revolutionizing how organizations build relationships with their clients and streamline their workflows.
Voice cloning benefits healthcare professionals and social workers that work in an online environment. Digital avatars featuring the same voice as medical professionals foster stronger bonds between them and their patients, raising trust and retaining customers.
The potential applications of voice cloning in the film and entertainment industry are vast. Dubbing content into multiple languages, children and adult additional dialog replacement (ADR), and an almost infinite array of customization options are all made possible by this technology.
Similarly, in the operations sector, AI-driven voice cloning can yield excellent results for brands in need of cost-efficient solutions for interactive voice response systems or corporate training videos. With voice synthesis technology, actors can expand their reach while increasing their ability to earn residuals from recordings.
Finally, in advertising production studios, the emergence of voice cloning has helped significantly reduce the costs and number of hours associated with commercial production. As long as there’s a high-quality recording available for cloning (even from unavailable actors), ads can be produced quickly and more creatively than ever before.
Interestingly enough, enterprises and SMBs can take advantage of voice cloning to create something unique for their brands. Big projects can realize their most ambitious plans, while small businesses can access previously cost-prohibitive scale models. That’s what true democratization means.
Wrapping up
AI voice cloning offers businesses game-changing benefits such as creating unique customer experiences, integrating natural language processing capabilities into their products and services, and generating highly accurate impersonations of voices that sound completely real.
Businesses looking to maintain their competitive edge in 2023 should look into AI voice cloning. Companies can use this technology to unlock a variety of new possibilities to win market share and retain customers while doing so in an ethically responsible way.
Credit: Source link
Comments are closed.