MURMR: A Multimodal Sensing Framework for Automated Group Behavior Analysis in Mixed Reality
Diana Romero, Yasra Chandio, Fatima Anwar, Salma Elmalaki
TL;DR
MURMR addresses the challenge of understanding group coordination in mixed reality by introducing a headset-only, passive sensing framework with two complementary modules: a Structural Analysis Module that builds automated sociograms from multimodal signals and a Temporal Analysis Module that unsupervisedly clusters moment-to-moment dyadic interactions. The system is validated in a 48-participant study, revealing that structural patterns are relatively stable over sessions yet highly dynamic at the 32-second window level, while temporal clustering uncovers distinct collaboration modes such as rhythmic leadership, animated collaboration, monotone focus, and instructor demonstrations. Key contributions include a passive multimodal sensing pipeline, on-device sociogram construction with modality fusion, and a deep clustering-based temporal framework with interpretable cluster semantics and cross-dyad generalizability. Practically, MURMR enables real-time group monitoring and rich post-hoc understanding in immersive collaboration, laying groundwork for adaptive MR systems that respond to evolving group dynamics.
Abstract
Collaboration is at the heart of many complex tasks, and mixed reality (MR) offers a powerful new medium to support it. Understanding how teams coordinate in immersive environments is critical for designing effective MR applications that support collaborative work. However, existing methods rely on external observation systems and manual annotation, lacking deployable solutions for capturing temporal collaboration dynamics. We present MURMR, a system with two complementary modules that passively analyze multimodal interaction data from commodity MR headsets. Our structural analysis module constructs automated sociograms revealing group organization and roles, while our temporal analysis module performs unsupervised clustering to identify moment-to-moment dyad behavior patterns. Through a 48-participant study with egocentric video validation, we demonstrate that the structural module captures stable interaction patterns while the temporal module reveals substantial behavioral variability that session-level approaches miss. This dual-module architecture advances collaboration research by establishing that structural and temporal dynamics require separate analytical approaches, enabling both real-time group monitoring and detailed behavioral understanding in immersive collaborative environments.
