Speech Recognition Startup follows the legacy of Star Trek

By Petra Wiesmayer

Gene Roddenberry invented the universal translator with automatic speech recognition back in 1966, even if only for Captain Kirk, Spock and their colleagues on the spaceship Enterprise. Some 50 years later, a startup from Gilching-Oberpfaffenhofen near Munich is on its way to becoming a small step closer to a universal translator in the real world, through intelligent audio analysis.

A true universal translator is still a long way off, but the foundations are already being laid. “Today’s technology is so advanced that the characteristics of a person and their emotions can be recognized and understood regardless of the national language”, explains Dagmar Schuller, Managing Director of audEERING. Together with her colleagues Florian Eyben, Björn Schuller and Martin Wöllmer, she founded audEERING, a joint research project at the TU Munich, in 2012.

More on audEERING and intelligent speech recognition here

The Munich-based start-up is now a leader in the field of intelligent speech analysis and produces software that “can be used on standard devices such as smartphones and tablets, but also independently of them” and is becoming increasingly important not only for the automotive industry, but especially in medicine and care. “A doctor who has the software installed on his PC can use intelligent speech analysis while the patient is speaking, and through this, he can learn how the emotional state and other voice characteristics change and develop over time,” says Schuller. “This is particularly important when, for example, dementia research is concerned with the early detection of anomalies or deviations. In biography work, it is easier for many patients to record their thoughts by voice input than to keep a written diary. This also makes it possible to evaluate the data automatically and to recognize the subconscious emotional situation of the human being”.

“Technology recognizes the characteristics of voice and acoustics, but what is still important is the human being, who is then able to interpret and evaluate these data.”

Thanks to the unique VocEmoApI technology, it is possible to recognize more than 50 emotions and states such as satisfaction, anger or sadness and to recognize developments early on. Thus it could be predicted much earlier whether indicators for dementia were present or not. In the prevention of Parkinson’s disease, artificial intelligence also recognises changes in the voice triggered by paralysis of the fine laryngeal muscles. These characteristics can already be recognized substantially before other general symptoms occur, in the best case even many months before. However, artificial intelligence is still not able to do everything and man is still at the center of attention. “Technology recognizes the characteristics of voice and acoustics, but what is still important is the human being, who is then able to interpret and evaluate these data.”

A current project of audEERING is funded by the EU. As part of the ECoWeB research project, audEERING is developing an app in a consortium in which young people can recognize quite neutrally whether they have depressive moods. “This is particularly helpful for young people who place less trust in people and have easy access to technology. “

Support Us!

And the software from Munich also plays a role in the United States. The US company Jibo uses audEERING’s intelligent audio analysis technology for the first social robot of the same name. The 30-centimeter Jibo thus recognizes human speech in combination with video and relies on multimodal input recognition as the basis for social competence.

Photo: audEERING

More about intelligent speech recognition: