Table of Contents
Fetching ...

MURMR: A Multimodal Sensing Framework for Automated Group Behavior Analysis in Mixed Reality

Diana Romero, Yasra Chandio, Fatima Anwar, Salma Elmalaki

TL;DR

MURMR addresses the challenge of understanding group coordination in mixed reality by introducing a headset-only, passive sensing framework with two complementary modules: a Structural Analysis Module that builds automated sociograms from multimodal signals and a Temporal Analysis Module that unsupervisedly clusters moment-to-moment dyadic interactions. The system is validated in a 48-participant study, revealing that structural patterns are relatively stable over sessions yet highly dynamic at the 32-second window level, while temporal clustering uncovers distinct collaboration modes such as rhythmic leadership, animated collaboration, monotone focus, and instructor demonstrations. Key contributions include a passive multimodal sensing pipeline, on-device sociogram construction with modality fusion, and a deep clustering-based temporal framework with interpretable cluster semantics and cross-dyad generalizability. Practically, MURMR enables real-time group monitoring and rich post-hoc understanding in immersive collaboration, laying groundwork for adaptive MR systems that respond to evolving group dynamics.

Abstract

Collaboration is at the heart of many complex tasks, and mixed reality (MR) offers a powerful new medium to support it. Understanding how teams coordinate in immersive environments is critical for designing effective MR applications that support collaborative work. However, existing methods rely on external observation systems and manual annotation, lacking deployable solutions for capturing temporal collaboration dynamics. We present MURMR, a system with two complementary modules that passively analyze multimodal interaction data from commodity MR headsets. Our structural analysis module constructs automated sociograms revealing group organization and roles, while our temporal analysis module performs unsupervised clustering to identify moment-to-moment dyad behavior patterns. Through a 48-participant study with egocentric video validation, we demonstrate that the structural module captures stable interaction patterns while the temporal module reveals substantial behavioral variability that session-level approaches miss. This dual-module architecture advances collaboration research by establishing that structural and temporal dynamics require separate analytical approaches, enabling both real-time group monitoring and detailed behavioral understanding in immersive collaborative environments.

MURMR: A Multimodal Sensing Framework for Automated Group Behavior Analysis in Mixed Reality

TL;DR

MURMR addresses the challenge of understanding group coordination in mixed reality by introducing a headset-only, passive sensing framework with two complementary modules: a Structural Analysis Module that builds automated sociograms from multimodal signals and a Temporal Analysis Module that unsupervisedly clusters moment-to-moment dyadic interactions. The system is validated in a 48-participant study, revealing that structural patterns are relatively stable over sessions yet highly dynamic at the 32-second window level, while temporal clustering uncovers distinct collaboration modes such as rhythmic leadership, animated collaboration, monotone focus, and instructor demonstrations. Key contributions include a passive multimodal sensing pipeline, on-device sociogram construction with modality fusion, and a deep clustering-based temporal framework with interpretable cluster semantics and cross-dyad generalizability. Practically, MURMR enables real-time group monitoring and rich post-hoc understanding in immersive collaboration, laying groundwork for adaptive MR systems that respond to evolving group dynamics.

Abstract

Collaboration is at the heart of many complex tasks, and mixed reality (MR) offers a powerful new medium to support it. Understanding how teams coordinate in immersive environments is critical for designing effective MR applications that support collaborative work. However, existing methods rely on external observation systems and manual annotation, lacking deployable solutions for capturing temporal collaboration dynamics. We present MURMR, a system with two complementary modules that passively analyze multimodal interaction data from commodity MR headsets. Our structural analysis module constructs automated sociograms revealing group organization and roles, while our temporal analysis module performs unsupervised clustering to identify moment-to-moment dyad behavior patterns. Through a 48-participant study with egocentric video validation, we demonstrate that the structural module captures stable interaction patterns while the temporal module reveals substantial behavioral variability that session-level approaches miss. This dual-module architecture advances collaboration research by establishing that structural and temporal dynamics require separate analytical approaches, enabling both real-time group monitoring and detailed behavioral understanding in immersive collaborative environments.

Paper Structure

This paper contains 38 sections, 2 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: MURMR framework for sensing and analyzing collaborative group behavior in MR, starts by synchronizing multimodal sensor data to maintain consistency, then operates at two complementary modules. [Left] At a macro-level, session-long sensor data is aggregated to build sociograms that reveal overall group structure. At a micro-level, a temporal analysis of short interaction windows classifies fine-grained behavioral patterns. [Right] These analytics generate comprehensive post-session insights, including structural network metrics, temporal behavioral segmentation timelines, and dyadic interaction patterns that capture the various aspects of collaborative group behavior.
  • Figure 2: Windowed-session analysis to assess behavioral patterns obscured by session-level aggregation. Conversation reciprocity for representative Groups $8, 10, 12$ (left), Group 12’s multimodal density trajectory (center), and density variation (right).
  • Figure 3: 3D UMAP of 71404 window embeddings. Colors denote clusters; distinct manifolds confirm high silhouette quality.
  • Figure 4: Heatmap of clusters (rows) vs. features (columns), where color intensity shows each feature’s deviation from its mean.
  • Figure 5: Fused eigenvector over time for one pair in Group 10, with the transition from Cluster 3 to Cluster 1 highlighted.