Adaptive Gen-AI Guidance in Virtual Reality: A Multimodal Exploration of Engagement in Neapolitan Pizza-Making
Ka Hei Carrie Lau, Sema Sen, Philipp Stark, Efe Bozkir, Enkelejda Kasneci
TL;DR
This work addresses the challenge of evaluating adaptive Gen-AI guidance in VR for procedural learning in intangible cultural heritage. It introduces Neapolitan Pizza VR, a multimodal, AI-assisted VR environment where adaptivity is varied across none, moderate, and high levels, with engagement measured via total time, avatar dwell time, head movements, and verbal interaction in 54 participants. The key finding is that moderate adaptivity yields the strongest multimodal engagement by increasing visual attention to the AI tutor and reducing unnecessary exploration, while higher adaptivity provides no additional benefit and may constrain autonomy. The study demonstrates the value of multimodal metrics for assessing dynamic adaptive systems and offers practical design guidelines for balancing adaptivity in VR-based cultural learning, with open-science provisions to foster replication and extension.
Abstract
Virtual reality (VR) offers promising opportunities for procedural learning, particularly in preserving intangible cultural heritage. Advances in generative artificial intelligence (Gen-AI) further enrich these experiences by enabling adaptive learning pathways. However, evaluating such adaptive systems using traditional temporal metrics remains challenging due to the inherent variability in Gen-AI response times. To address this, our study employs multimodal behavioural metrics, including visual attention, physical exploratory behaviour, and verbal interaction, to assess user engagement in an adaptive VR environment. In a controlled experiment with 54 participants, we compared three levels of adaptivity (high, moderate, and non-adaptive baseline) within a Neapolitan pizza-making VR experience. Results show that moderate adaptivity optimally enhances user engagement, significantly reducing unnecessary exploratory behaviour and increasing focused visual attention on the AI avatar. Our findings suggest that a balanced level of adaptive AI provides the most effective user support, offering practical design recommendations for future adaptive educational technologies.
