Table of Contents
Fetching ...

Multimodal Machine Learning for Automated Assessment of Attention-Related Processes during Learning

Babette Bühler

TL;DR

This work tackles the critical problem of objectively assessing attention-related processes during learning by leveraging multimodal machine learning on eye-tracking, video, and physiological data. It introduces a fine-grained distinction between aware and unaware mind wandering, demonstrates multimodal fusion that outperforms unimodal baselines, and tests generalizability to in-the-wild and cross-cultural settings using transfer learning. It also develops gaze-synchrony and hand-raising detection methods to index attention online and in classroom contexts, respectively, while addressing explainability and privacy concerns for educational practice. The findings provide a path toward scalable, attention-aware learning technologies and data-driven insights into how attentional dynamics influence learning outcomes, with practical implications for interventions and classroom analytics. Overall, the thesis bridges education theory with state-of-the-art computer science methods to advance fine-grained, scalable assessment of attention in diverse educational contexts.

Abstract

Attention is a key factor for successful learning, with research indicating strong associations between (in)attention and learning outcomes. This dissertation advanced the field by focusing on the automated detection of attention-related processes using eye tracking, computer vision, and machine learning, offering a more objective, continuous, and scalable assessment than traditional methods such as self-reports or observations. It introduced novel computational approaches for assessing various dimensions of (in)attention in online and classroom learning settings and addressing the challenges of precise fine-granular assessment, generalizability, and in-the-wild data quality. First, this dissertation explored the automated detection of mind-wandering, a shift in attention away from the learning task. Aware and unaware mind wandering were distinguished employing a novel multimodal approach that integrated eye tracking, video, and physiological data. Further, the generalizability of scalable webcam-based detection across diverse tasks, settings, and target groups was examined. Second, this thesis investigated attention indicators during online learning. Eye-tracking analyses revealed significantly greater gaze synchronization among attentive learners. Third, it addressed attention-related processes in classroom learning by detecting hand-raising as an indicator of behavioral engagement using a novel view-invariant and occlusion-robust skeleton-based approach. This thesis advanced the automated assessment of attention-related processes within educational settings by developing and refining methods for detecting mind wandering, on-task behavior, and behavioral engagement. It bridges educational theory with advanced methods from computer science, enhancing our understanding of attention-related processes that significantly impact learning outcomes and educational practices.

Multimodal Machine Learning for Automated Assessment of Attention-Related Processes during Learning

TL;DR

This work tackles the critical problem of objectively assessing attention-related processes during learning by leveraging multimodal machine learning on eye-tracking, video, and physiological data. It introduces a fine-grained distinction between aware and unaware mind wandering, demonstrates multimodal fusion that outperforms unimodal baselines, and tests generalizability to in-the-wild and cross-cultural settings using transfer learning. It also develops gaze-synchrony and hand-raising detection methods to index attention online and in classroom contexts, respectively, while addressing explainability and privacy concerns for educational practice. The findings provide a path toward scalable, attention-aware learning technologies and data-driven insights into how attentional dynamics influence learning outcomes, with practical implications for interventions and classroom analytics. Overall, the thesis bridges education theory with state-of-the-art computer science methods to advance fine-grained, scalable assessment of attention in diverse educational contexts.

Abstract

Attention is a key factor for successful learning, with research indicating strong associations between (in)attention and learning outcomes. This dissertation advanced the field by focusing on the automated detection of attention-related processes using eye tracking, computer vision, and machine learning, offering a more objective, continuous, and scalable assessment than traditional methods such as self-reports or observations. It introduced novel computational approaches for assessing various dimensions of (in)attention in online and classroom learning settings and addressing the challenges of precise fine-granular assessment, generalizability, and in-the-wild data quality. First, this dissertation explored the automated detection of mind-wandering, a shift in attention away from the learning task. Aware and unaware mind wandering were distinguished employing a novel multimodal approach that integrated eye tracking, video, and physiological data. Further, the generalizability of scalable webcam-based detection across diverse tasks, settings, and target groups was examined. Second, this thesis investigated attention indicators during online learning. Eye-tracking analyses revealed significantly greater gaze synchronization among attentive learners. Third, it addressed attention-related processes in classroom learning by detecting hand-raising as an indicator of behavioral engagement using a novel view-invariant and occlusion-robust skeleton-based approach. This thesis advanced the automated assessment of attention-related processes within educational settings by developing and refining methods for detecting mind wandering, on-task behavior, and behavioral engagement. It bridges educational theory with advanced methods from computer science, enhancing our understanding of attention-related processes that significantly impact learning outcomes and educational practices.
Paper Structure (191 sections, 7 equations, 54 figures, 25 tables)

This paper contains 191 sections, 7 equations, 54 figures, 25 tables.

Figures (54)

  • Figure 1: Webcam Image With Depiction of Openface Head Pose, Gaze and Facial Landmark Features and Classroom Video Frame With Openpose Pose Estimations.
  • Figure 2: Remote SMI Eye Tracker and Exemplary Scanpath Visualization during Remote Learning.
  • Figure 3: Empatica E4 Wristband and EDA and BVP (Heart Rate) Signals.
  • Figure 4: Schematic Pipeline for Predictive Modeling of Attentional States. ML -- Machine Learning.
  • Figure 5: Schematic Overview Over the Contributions in this Dissertation, Structured by Attention-Related Processes and Learning Settings.
  • ...and 49 more figures