Since U Been Gone: Augmenting Context-Aware Transcriptions for Re-engaging in Immersive VR Meetings
Geonsun Lee, Yue Yang, Jennifer Healey, Dinesh Manocha
TL;DR
The paper addresses the challenge of sustaining engagement in immersive VR meetings after disruptions by introducing EngageSync, a context-aware avatar-fixed transcription interface that provides live transcripts and LLМ-generated summaries to support re-engagement while preserving social presence. EngageSync operates in two modes—Engagement and Re-engagement—driven by gaze, speech activity, and pinch gestures, and it delivers on-demand access with automatic mode switching. Through formative and user studies across small and mid-sized groups, the authors show that EngageSync improves social presence, increases attention to avatars, reduces re-engagement time, and enhances information recall, with stronger effects in larger groups. The work offers design insights for adaptive transcription in VR, demonstrates the practicality of context-aware captions, and suggests that avatar-fixed, gaze-triggered, on-demand interfaces can better balance immersion with information catch-up in immersive meetings.
Abstract
Maintaining engagement in immersive meetings is challenging, particularly when users must catch up on missed content after disruptions. While transcription interfaces can help, table-fixed panels have the potential to distract users from the group, diminishing social presence, while avatar-fixed captions fail to provide past context. We present EngageSync, a context-aware avatar-fixed transcription interface that adapts based on user engagement, offering live transcriptions and LLM-generated summaries to enhance catching up while preserving social presence. We implemented a live VR meeting setup for a 12-participant formative study and elicited design considerations. In two user studies with small (3 avatars) and mid-sized (7 avatars) groups, EngageSync significantly improved social presence (p < .05) and time spent gazing at others in the group instead of the interface over table-fixed panels. Also, it reduced re-engagement time and increased information recall (p < .05) over avatar-fixed interfaces, with stronger effects in mid-sized groups (p < .01).
