3D Gaze Tracking for Studying Collaborative Interactions in Mixed-Reality Environments
Eduardo Davalos, Yike Zhang, Ashwin T. S., Joyce H. Fonteles, Umesh Timalsina, Guatam Biswas
TL;DR
This work tackles the challenge of studying collaborative interactions in mixed-reality by enabling robust, multi-user 3D gaze tracking without specialized hardware. It introduces a two-stage framework consisting of a Face Recognition Module for identity-consistent tracking and a Gaze Analysis Module for 3D scene reconstruction, 2D-to-3D reprojection, and gaze ray tracing to 3D OOIs, leveraging L2CS-Net for gaze vectors and ZoeDepth for metric depth. Key contributions include continuous ReID tracklets, 3D OOIs encoding, 3D gaze ray tracing, and a multimodal timeline visualization, demonstrated with social-network-style gaze analysis. The approach is validated in a real classroom setting, showing practical potential for educational analytics and collaborative interaction studies, while acknowledging limitations in error accumulation and computational demands for real-time deployment.
Abstract
This study presents a novel framework for 3D gaze tracking tailored for mixed-reality settings, aimed at enhancing joint attention and collaborative efforts in team-based scenarios. Conventional gaze tracking, often limited by monocular cameras and traditional eye-tracking apparatus, struggles with simultaneous data synchronization and analysis from multiple participants in group contexts. Our proposed framework leverages state-of-the-art computer vision and machine learning techniques to overcome these obstacles, enabling precise 3D gaze estimation without dependence on specialized hardware or complex data fusion. Utilizing facial recognition and deep learning, the framework achieves real-time, tracking of gaze patterns across several individuals, addressing common depth estimation errors, and ensuring spatial and identity consistency within the dataset. Empirical results demonstrate the accuracy and reliability of our method in group environments. This provides mechanisms for significant advances in behavior and interaction analysis in educational and professional training applications in dynamic and unstructured environments.
