Multimodal Machine Learning for Automated Assessment of Attention-Related Processes during Learning
Babette Bühler
TL;DR
This work tackles the critical problem of objectively assessing attention-related processes during learning by leveraging multimodal machine learning on eye-tracking, video, and physiological data. It introduces a fine-grained distinction between aware and unaware mind wandering, demonstrates multimodal fusion that outperforms unimodal baselines, and tests generalizability to in-the-wild and cross-cultural settings using transfer learning. It also develops gaze-synchrony and hand-raising detection methods to index attention online and in classroom contexts, respectively, while addressing explainability and privacy concerns for educational practice. The findings provide a path toward scalable, attention-aware learning technologies and data-driven insights into how attentional dynamics influence learning outcomes, with practical implications for interventions and classroom analytics. Overall, the thesis bridges education theory with state-of-the-art computer science methods to advance fine-grained, scalable assessment of attention in diverse educational contexts.
Abstract
Attention is a key factor for successful learning, with research indicating strong associations between (in)attention and learning outcomes. This dissertation advanced the field by focusing on the automated detection of attention-related processes using eye tracking, computer vision, and machine learning, offering a more objective, continuous, and scalable assessment than traditional methods such as self-reports or observations. It introduced novel computational approaches for assessing various dimensions of (in)attention in online and classroom learning settings and addressing the challenges of precise fine-granular assessment, generalizability, and in-the-wild data quality. First, this dissertation explored the automated detection of mind-wandering, a shift in attention away from the learning task. Aware and unaware mind wandering were distinguished employing a novel multimodal approach that integrated eye tracking, video, and physiological data. Further, the generalizability of scalable webcam-based detection across diverse tasks, settings, and target groups was examined. Second, this thesis investigated attention indicators during online learning. Eye-tracking analyses revealed significantly greater gaze synchronization among attentive learners. Third, it addressed attention-related processes in classroom learning by detecting hand-raising as an indicator of behavioral engagement using a novel view-invariant and occlusion-robust skeleton-based approach. This thesis advanced the automated assessment of attention-related processes within educational settings by developing and refining methods for detecting mind wandering, on-task behavior, and behavioral engagement. It bridges educational theory with advanced methods from computer science, enhancing our understanding of attention-related processes that significantly impact learning outcomes and educational practices.
