Spoken language ambiguitites

Dec. 28, 2017, by Bianca Vieru

Extrapolating from the linguistic ambiguity in the above image, adding some noises, and an assortement of regional and non-native accents, it is easy to imagine the difficulties faced by the automatic recognition systems and all downstream processing technologies that rely on them.

Speech recognition belongs to the family of technologies that help machines to communicate in meaningful ways with humans and their environment ( According to ComScore, 50% of all searches will be voice searches by 2020   ( It is a basic key component for several trending technologies like Speech-to-Speech translation, Conversational Platforms, Dialog Systems, etc. 

The performance of the speech recognition system influences the performance of all successive modules in a processing chain. So an  incorrectly recognized sentence is more likely to give a wrong answer than a correctly recognized one. Depending upon the processing step, even a perfectly recognized sentence may be incorrectly interpreted since spoken language can be ambiguous if taken out of context. In Kristina, speech recognition is the entry point to the global system. The spoken language understanding  module is applied to the hypothesized text output by the speech transcription system.  On the side of speech recognition, in addition to the spoken words, the system needs to output correct caseing and punctuation marks that are needed for language analysis. On the analysis side, most systems are trained on textual data, and need to be modified to cope with the characteristics and imprecisions of spoken language. Despite major progress in recent years, in part due to the availability of large corpora and advanced machine learning techniques, all of these technologies remain heavily domain dependent. To facilitate the customization of the systems, the user partners have collected specific data sets for KRISTINA, which were used in the project to train and periodically evaluate the advances. Several prototypes were developed and assessed in user trials to validate the complete system.

multilingual intelligent embodied agent social competence adaptive dialogue expressive speech recognition and synthesis discourse generation vocal facial and gestural social and emotional cues