In recent years, the use of AI voices in various applications has increased immensely. From voice assistants and chatbots to audiobooks and eLearning, AI voices, providers would have us believe, are ready for use anywhere and equal to their flesh-and-blood counterparts. But is this true when superlatives are used to talk about quality and naturalness? We listened more closely, and – spoiler alert – we are far from ready to leave audio projects to bits and bytes with a clear conscience.
Hire professional voice actors with confidence
On bodalgo, you will find 12,957 professional voice talents for commercials, eLearning, explainers, documentaries, presentations, and much more. bodalgo's multiple award-winning online casting helps you find the perfect voice for your projects. Easy, fast, and 100 percent fee-free.
In the long run, one of the most apparent reasons AI voices cannot replace professional voice actors is the need for more human intonation and emotion. While AI voices are perfectly capable of pronouncing words and sentences correctly these days, they cannot place the correct stresses and pauses to add meaning and emotion to what is being said. Why? Because AI does not understand the text it is reading. But this understanding allows us to make essential connections through sentence melody, stresses, and dynamics. It is this interplay that brings a spoken text to life.
Flawed understanding
Lack of comprehension leads to monotonous and flat-sounding texts, resulting in lower engagement and interest on the listener's part. This is not ideal in eLearning and is disastrous in advertising. There is no real improvement to be expected here soon, either. Yes, the voices themselves sound natural. With machine learning and lots of training, I can imitate sentence melodies that might match the text.

Consciousness is paramount for understanding copy – which AI lacks
But: what if they don't? What if the director would have liked the last paragraph to be spoken with a bit more "red carpet" or the previous sentence to be spoken with an emphasis on a particular word? What if the attitude is fundamentally too soft or too matter-of-fact? How do you convey to the language model that it should approach the copy with more pressure, sensitivity, warmth, or aggression?
It. Does. Not. Work. It might work one day. But not for the foreseeable future.
Awareness, experience, reflection
Why? Because the machine is not conscious of it. But consciousness is key. When a talent reads a text, she reflects the copy with her voice and her whole experience as a talent, human, and sentient being. And she gives the copy an individual character, which stage directions can even fine-tune.
Who does the AI voice belong to?
Apart from creative-human aspects, trouble threatens from a completely different side: Who owns AI voices? In recent months, artists have fought against the unsolicited use of their works as training units for AI models. And there is also resistance from voice actors. The case law here is still in its infancy. Still, it takes little imagination to grasp how complicated the copyright question becomes when an AI has been trained with the audio material of thousands of talents.
There are growing indications that various companies offer AI voices whose models have been trained with recordings for which the rights of use seem unclear. The customers of these providers may also bear the risk of facing legal consequences down the line.
Conclusion
Without question, artificial voices have come a long way. With applications such as Siri, Alexa, and Co., we gladly accept the technical intonation of our digital assistants. However, the situation is completely different for all applications that expect a "human touch." Here, professional voices will remain the first choice.
Photos: Pexels