Integrating Representational Gestures into Automatically Generated Embodied Explanations and its Effects on Understanding and Interaction Quality
Amelie Sophie Robrecht, Hendric Voss, Lisa Gottschalk, Stefan Kopp
TL;DR
The paper investigates how representational gestures affect understanding and interaction quality in explanations delivered by an embodied virtual explainer for Quarto!. It combines beat gestures from a learned speech-driven synthesizer with manually captured iconic gestures, using a SNAPE-based adaptive explanation framework enhanced with LLM-generated utterances and a graph-based, real-time gesture generator. A four-condition online study (baseline, beat, iconic, mixed) reveals that iconic or mixed gestures do not outperform baseline or beat-only conditions and may even hinder deep understanding, while the embodied agent improves understanding relative to prior non-embodied explanations. These findings inform gesture design for multimodal explanations, highlighting cognitive-load considerations and recommending careful selection of gesture type and timing for effective learning and interaction.
Abstract
In human interaction, gestures serve various functions such as marking speech rhythm, highlighting key elements, and supplementing information. These gestures are also observed in explanatory contexts. However, the impact of gestures on explanations provided by virtual agents remains underexplored. A user study was carried out to investigate how different types of gestures influence perceived interaction quality and listener understanding. This study addresses the effect of gestures in explanation by developing an embodied virtual explainer integrating both beat gestures and iconic gestures to enhance its automatically generated verbal explanations. Our model combines beat gestures generated by a learned speech-driven synthesis module with manually captured iconic gestures, supporting the agent's verbal expressions about the board game Quarto! as an explanation scenario. Findings indicate that neither the use of iconic gestures alone nor their combination with beat gestures outperforms the baseline or beat-only conditions in terms of understanding. Nonetheless, compared to prior research, the embodied agent significantly enhances understanding.
