Synthetic speech can be a fearful object these days when paired with deepfakes and other AI deceptions, but it’s also an indispensable tool for anyone who can no longer speak on their own. Acapela Group has these folks squarely in mind with its new “my own voice” service, which lets anyone train an AI voice profile for free.
Acapela has been in the text-to-speech space for around 25 years and was recently acquired by tech accessibility giant Tobii Dynavox, though they still operate independently.
Like many industries, accessibility has been heavily influenced by the advent of consumer-scale machine learning processes. Seven or 8 years ago, Acapela co-founder Remy Cadic recalls, it was not just tedious to customize a synthetic voice for yourself, but the results weren’t particularly good.
“It was very time consuming — the patient had to train for 8 hours. Now we can bank a voice with just 50 sentences recorded; it takes about 10 minutes and the voice is ready the next day,” he said. “There’s definitely a revolution going on with neural text-to-speech techniques.”
Having a speech generator that uses one’s own voice is certainly something a growing number of people can appreciate — choosing from a list is a bit dehumanizing. Many have voices they would rather use, but it wasn’t until recently that it was an option.
They weren’t kidding about how quick and easy it is: I went through the new “my own voice” process, and it really was just 50 short sentences, drawn from a (random, it seemed) corpus of novels, recipe books, and articles. The recording interface was simple and easy to navigate, and sure enough, a day or so later my voice was ready to use. The quality is fine — not uncanny like some models out there can be, but clearly my own voice (as advertised) and able to handle any sentence I threw at it in the demo page.
Now that it’s there, if I ever need it I can go and download it for a fee to use on any compatible speech-generation system. Obviously this includes Tobii Dynavox’s TD Talk and devices; the company just released a new one last week, in fact — these things are getting pretty sleek.
And that’s the real point of all this — it’s not a technical demonstration of the power of neural voice tech or a demo that lets anyone feed it a celebrity voice to clone. It’s a tool made specifically for people who until recently may have had no options or at best a difficult, complex process if they wanted to preserve their voice.
Many facing degenerative conditions, cancer, or certain procedures know that within a few months or years they may not be able to speak well or at all anymore. Making the process of banking their voice as easy as possible is a service many will appreciate.
“One big advantage is we also customize for children — we’ve made the recording script easier to read, and tuned the system to make the quality of children’s synthetic voices better. We were the first in the world to do that, and we’re still going in this direction,” said Cadic.
Being able to record and re-record or artificially age the banked voice is a new and challenging capability, but one that seems to be getting results:
The compatibility with offline devices that don’t have the latest neural processing chip is a key differentiator as well. “There are online solutions where it’s easy to create a voice, but it’s only available via the cloud, and that’s just not practical,” he said.
Incidentally, while the 50-sentence thing is great for folks who can still read and speak, a voice can also be trained on voice recordings from people who have since lost that ability — it just isn’t quite so simple.
The company has also found that diversity and thoughtfulness in the training process is as important as in other AI applications. Cadic pointed out that an issue with some super-fast training techniques is that “it will pretty much just try to find the speaker in the training material that’s closest to the user. But if there isn’t a speaker in the training close to the original voice, it just won’t sound like it.”
Acapela product manager Nicolas Mazars added that, like many AI problems with their root in insufficient training data, this one is not evenly distributed: “That process works well for the average 50-year-old white guy, but not if you’re an African-American man, or you don’t speak English well. We work in 23 languages, and have many users who have disabilities. We try to rely on user feedback and develop something for them, by them.”
The recording and banking process is free; you can sign up for an account here and train your own synthetic voice in minutes. You only pay if you want to download and install it on a device.