Standards in Kristina’s Avatar; Meeting between ARIA-VALUSPA and KRISTINA

March 22, 2017, by lpmayos

Last month, a workshop between the members of KRISTINA and ARIA-VALUSPA, another H2020 project on interactive virtual agents, was held in Dagstuhl (a computer science centre in the south-west of Germany). During this workshop several meetings to discuss different disciplines that creating interactive virtual agents involves took place.

From a post of the ARIA-VALUSPA project (by Merijn Bruijnes):
“The ARIA VALUSPA project (Artificial Retrieval of Information Assistants – Virtual Agents with Linguistic Understanding, Social skills, and Personalised Aspects) aims to create Artificial Retrieval of Information Assistants (ARIAs) that are capable of holding multi-modal social interactions in challenging and unexpected situations. The technology developed in the project will be show-cased in a virtual agent called Alice, from the book Alice in Wonderland. A user can interview Alice about her unique perspective on the story.

During the joint seminar, demos from both projects were shown. A notable demo for the ARIA project was presented by Angelo Cafaro, who showed the handling by the virtual agent of user interruptions was realised on the behaviour generation level. For the KRISTINA project, a notable demo was presented by Dominik Schiller, who showed how their agent could emphatically react to a depressed user... Additionally, there was an interesting invited keynote by Patrick Gebhard from the DFKI lab. He detailed their Virtual Scene Maker and how it can be used when designing real time interactive systems. After his talk he demoed the system and the ease of configurability was impressive. Yet, the most impressive demo was by Gerard and Angelo who in about an hour managed to connect the Greta platform from the ARIA project to the agent web-interface from the KRISTINA project. This really showed the importance and effect of standards.”

Ten years ago, a standard to control interactive virtual characters was defined by most of the experts in Embodied Conversational Agents (ECAs): the Behavior Markup Language (BML) [1]. GTI-UPF implemented the  web-based Kristina agent using this standard and, although GTI-UPF within KRISTINA only controls the specific behavior of the agent, the agent was implemented as an open source virtual character on the web. The goal of creating the web-based virtual character in a way that can be controlled via BML, or so called BML Realizer, is that any researcher can connect their BML generators, without having to install additional software or having to deal with the creation and support of virtual characters.

GTI-UPF has been working on a system to easily integrate virtual characters on the web, where the approach described above fits. Currently all tools used to create avatars within KRISTINA are open source. GTI-UPF implemented some open source libraries and scripts to control the behavior of the ECA in WebGLStudio, among them a facial animation system based on valence and arousal as proposed by [2], a web-based lip synchronization [3] and some of the BML commands. Creating a new operational avatar with the pipeline GTI-UPF set up takes less than a morning of work by a competent person, achieving the integration into WebGLStudio with the scripts and libraries. This simplification of the creation and support of virtual characters can be a major step towards testability and replicability of the research.

The seminar in Dagstuhl proved the usefulness of the approach that GTI-UPF has been working on. In an hour of coding, the Greta platform could control the web-based agent via BML commands. GTI-UPF has developed an agent that can be easily connected to other systems and additionally, as it is a web implementation, the virtual agent can be accessed from any computer and controlled remotely. GTI-UPF plans to further develop the web-based agent as well as tools on the web so that others can implement their own virtual agents. We expect to provide a full web-based BML Realizer to allow researchers to compare and test their engines in a more standardized way, by means of massive user studies over the web.


[1] Stefan Kopp, Brigitte Krenn, Stacy Marsella, Andrew N. Marshall, Catherine Pelachaud, Hannes Pirker, Kristinn R. Thórisson, and Hannes Vilhjálmsson. 2006. Towards a common framework for multimodal generation: the behavior markup language. In Proceedings of the 6th international conference on Intelligent Virtual Agents (IVA'06), Jonathan Gratch, Michael Young, Ruth Aylett, Daniel Ballin, and Patrick Olivier (Eds.). Springer-Verlag, Berlin, Heidelberg, 205-217

[2] Romeo, M.: Automated Processes and Intelligent Tools in CG Media Production. PhD thesis, 119-148 (2016)

[3] Llorach, G., A. Evans, J. Blat, G. Grimm, V. Hohmann. Web-based live speech- driven lip-sync. In: 8th International Conference on Games and Virtual Worlds for Serious Applications (VS-Games) (2016)

multilingual intelligent embodied agent social competence adaptive dialogue expressive speech recognition and synthesis discourse generation vocal facial and gestural social and emotional cues