Table of Contents
Fetching ...

Adaptive Gen-AI Guidance in Virtual Reality: A Multimodal Exploration of Engagement in Neapolitan Pizza-Making

Ka Hei Carrie Lau, Sema Sen, Philipp Stark, Efe Bozkir, Enkelejda Kasneci

TL;DR

This work addresses the challenge of evaluating adaptive Gen-AI guidance in VR for procedural learning in intangible cultural heritage. It introduces Neapolitan Pizza VR, a multimodal, AI-assisted VR environment where adaptivity is varied across none, moderate, and high levels, with engagement measured via total time, avatar dwell time, head movements, and verbal interaction in 54 participants. The key finding is that moderate adaptivity yields the strongest multimodal engagement by increasing visual attention to the AI tutor and reducing unnecessary exploration, while higher adaptivity provides no additional benefit and may constrain autonomy. The study demonstrates the value of multimodal metrics for assessing dynamic adaptive systems and offers practical design guidelines for balancing adaptivity in VR-based cultural learning, with open-science provisions to foster replication and extension.

Abstract

Virtual reality (VR) offers promising opportunities for procedural learning, particularly in preserving intangible cultural heritage. Advances in generative artificial intelligence (Gen-AI) further enrich these experiences by enabling adaptive learning pathways. However, evaluating such adaptive systems using traditional temporal metrics remains challenging due to the inherent variability in Gen-AI response times. To address this, our study employs multimodal behavioural metrics, including visual attention, physical exploratory behaviour, and verbal interaction, to assess user engagement in an adaptive VR environment. In a controlled experiment with 54 participants, we compared three levels of adaptivity (high, moderate, and non-adaptive baseline) within a Neapolitan pizza-making VR experience. Results show that moderate adaptivity optimally enhances user engagement, significantly reducing unnecessary exploratory behaviour and increasing focused visual attention on the AI avatar. Our findings suggest that a balanced level of adaptive AI provides the most effective user support, offering practical design recommendations for future adaptive educational technologies.

Adaptive Gen-AI Guidance in Virtual Reality: A Multimodal Exploration of Engagement in Neapolitan Pizza-Making

TL;DR

This work addresses the challenge of evaluating adaptive Gen-AI guidance in VR for procedural learning in intangible cultural heritage. It introduces Neapolitan Pizza VR, a multimodal, AI-assisted VR environment where adaptivity is varied across none, moderate, and high levels, with engagement measured via total time, avatar dwell time, head movements, and verbal interaction in 54 participants. The key finding is that moderate adaptivity yields the strongest multimodal engagement by increasing visual attention to the AI tutor and reducing unnecessary exploration, while higher adaptivity provides no additional benefit and may constrain autonomy. The study demonstrates the value of multimodal metrics for assessing dynamic adaptive systems and offers practical design guidelines for balancing adaptivity in VR-based cultural learning, with open-science provisions to foster replication and extension.

Abstract

Virtual reality (VR) offers promising opportunities for procedural learning, particularly in preserving intangible cultural heritage. Advances in generative artificial intelligence (Gen-AI) further enrich these experiences by enabling adaptive learning pathways. However, evaluating such adaptive systems using traditional temporal metrics remains challenging due to the inherent variability in Gen-AI response times. To address this, our study employs multimodal behavioural metrics, including visual attention, physical exploratory behaviour, and verbal interaction, to assess user engagement in an adaptive VR environment. In a controlled experiment with 54 participants, we compared three levels of adaptivity (high, moderate, and non-adaptive baseline) within a Neapolitan pizza-making VR experience. Results show that moderate adaptivity optimally enhances user engagement, significantly reducing unnecessary exploratory behaviour and increasing focused visual attention on the AI avatar. Our findings suggest that a balanced level of adaptive AI provides the most effective user support, offering practical design recommendations for future adaptive educational technologies.

Paper Structure

This paper contains 31 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Architecture for creating an adaptive VR experience. The simplified VR environment is for illustration only; the actual setup is shown in Figure \ref{['fig:vr_experience']}.
  • Figure 2: Experiment setup with Varjo XR-3 headset and HTC Vive controllers.
  • Figure 3: VR scene showing avatar interaction within the environment.
  • Figure 4: Stages in the experimental setup: (1) Onboarding, (2) Gameplay, and (3) Poster exploration. Tutor responses and poster content adapt according to the assigned condition: (non-adaptive baseline), ingredient‑based (moderate), or demographics and behaviour-based (high).
  • Figure 5: Total experiment time (seconds) across adaptivity conditions (No, Moderate, High). Significance levels are indicated by ****, corresponding to $p < .0001$.
  • ...and 4 more figures