Table of Contents
Fetching ...

Safeguarding Privacy: Privacy-Preserving Detection of Mind Wandering and Disengagement Using Federated Learning in Online Education

Anna Bodonhelyi, Mengdi Wang, Efe Bozkir, Babette Bühler, Enkelejda Kasneci

TL;DR

This work tackles the privacy challenges of detecting mind wandering, disengagement, and boredom in online education by employing cross-device federated learning to train models directly on learners' devices. It uses video-derived facial and gaze features (via EmoNet and OpenFace) within a bi-LSTM framework, augmented with glass-aware features (MeGlass) to address eyeglasses disturbances, and evaluates six FL algorithms across five datasets. Across user-independent splits, federated approaches often match or exceed centralized performance, demonstrating privacy-preserving viability alongside robustness to data heterogeneity and lighting variations. The study lays groundwork for real-time, privacy-conscious educational support tools, while outlining ethical considerations and avenues for future enhancement with differential privacy, secure aggregation, and personalized FL adoption.

Abstract

Since the COVID-19 pandemic, online courses have expanded access to education, yet the absence of direct instructor support challenges learners' ability to self-regulate attention and engagement. Mind wandering and disengagement can be detrimental to learning outcomes, making their automated detection via video-based indicators a promising approach for real-time learner support. However, machine learning-based approaches often require sharing sensitive data, raising privacy concerns. Federated learning offers a privacy-preserving alternative by enabling decentralized model training while also distributing computational load. We propose a framework exploiting cross-device federated learning to address different manifestations of behavioral and cognitive disengagement during remote learning, specifically behavioral disengagement, mind wandering, and boredom. We fit video-based cognitive disengagement detection models using facial expressions and gaze features. By adopting federated learning, we safeguard users' data privacy through privacy-by-design and introduce a novel solution with the potential for real-time learner support. We further address challenges posed by eyeglasses by incorporating related features, enhancing overall model performance. To validate the performance of our approach, we conduct extensive experiments on five datasets and benchmark multiple federated learning algorithms. Our results show great promise for privacy-preserving educational technologies promoting learner engagement.

Safeguarding Privacy: Privacy-Preserving Detection of Mind Wandering and Disengagement Using Federated Learning in Online Education

TL;DR

This work tackles the privacy challenges of detecting mind wandering, disengagement, and boredom in online education by employing cross-device federated learning to train models directly on learners' devices. It uses video-derived facial and gaze features (via EmoNet and OpenFace) within a bi-LSTM framework, augmented with glass-aware features (MeGlass) to address eyeglasses disturbances, and evaluates six FL algorithms across five datasets. Across user-independent splits, federated approaches often match or exceed centralized performance, demonstrating privacy-preserving viability alongside robustness to data heterogeneity and lighting variations. The study lays groundwork for real-time, privacy-conscious educational support tools, while outlining ethical considerations and avenues for future enhancement with differential privacy, secure aggregation, and personalized FL adoption.

Abstract

Since the COVID-19 pandemic, online courses have expanded access to education, yet the absence of direct instructor support challenges learners' ability to self-regulate attention and engagement. Mind wandering and disengagement can be detrimental to learning outcomes, making their automated detection via video-based indicators a promising approach for real-time learner support. However, machine learning-based approaches often require sharing sensitive data, raising privacy concerns. Federated learning offers a privacy-preserving alternative by enabling decentralized model training while also distributing computational load. We propose a framework exploiting cross-device federated learning to address different manifestations of behavioral and cognitive disengagement during remote learning, specifically behavioral disengagement, mind wandering, and boredom. We fit video-based cognitive disengagement detection models using facial expressions and gaze features. By adopting federated learning, we safeguard users' data privacy through privacy-by-design and introduce a novel solution with the potential for real-time learner support. We further address challenges posed by eyeglasses by incorporating related features, enhancing overall model performance. To validate the performance of our approach, we conduct extensive experiments on five datasets and benchmark multiple federated learning algorithms. Our results show great promise for privacy-preserving educational technologies promoting learner engagement.
Paper Structure (29 sections, 3 equations, 6 figures, 11 tables)

This paper contains 29 sections, 3 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Our proposed federated learning algorithm in an online learning scenario, where the server aggregates the trained client models on the extracted facial features from the students' video data to predict the remote learner state.
  • Figure 2: Example of correct and incorrect eyes detection with Openface baltrusaitis2018openface. In both cases, OpenFace yields a confidence value over 0.97.
  • Figure 3: Dataset visualization based on the number of samples per client (a), and hidden representations (b)
  • Figure 4: Model architecture. After extracting the EmoNet toisoul2021estimation and OpenFace baltrusaitis2018openface features, we analyze each frame, whether OpenFace feature extraction was successful (this is saved for each frame and shows OpenFace's confidence in its prediction). Next, we downsample the frames with detected faces on them and create the input vector for the neural network, which consists of two main parts: bi-LSTM and MLP layers.
  • Figure 5: Before and after applying video enhancement from Harmonizer ke2022harmonizer on a video from DAiSEE gupta2016daiseekamath2016crowdsourced.
  • ...and 1 more figures