Table of Contents
Fetching ...

A General Model for Detecting Learner Engagement: Implementation and Evaluation

Somayeh Malekshahi, Javad M. Kheyridoost, Omid Fatemi

TL;DR

The paper addresses automatic detection of learner engagement in e-learning by introducing a general, temporally aware model that processes frame sequences from videos while preserving temporal order. It builds per-frame $d=7$ emotion features from a CNN-based emotion detector and uses an education-driven two-level Engaged/Disengaged labeling policy on the DAiSEE dataset, balancing data via undersampling. The approach achieves an accuracy of 68.57% on the adapted Eng-Confusion labeling, outperforming several prior, more complex methods while remaining lightweight and suitable for online use. This work demonstrates the value of education-specific labeling and targeted feature selection for efficient, real-time engagement monitoring in synchronous classrooms.

Abstract

Considering learner engagement has a mutual benefit for both learners and instructors. Instructors can help learners increase their attention, involvement, motivation, and interest. On the other hand, instructors can improve their instructional performance by evaluating the cumulative results of all learners and upgrading their training programs. This paper proposes a general, lightweight model for selecting and processing features to detect learners' engagement levels while preserving the sequential temporal relationship over time. During training and testing, we analyzed the videos from the publicly available DAiSEE dataset to capture the dynamic essence of learner engagement. We have also proposed an adaptation policy to find new labels that utilize the affective states of this dataset related to education, thereby improving the models' judgment. The suggested model achieves an accuracy of 68.57\% in a specific implementation and outperforms the studied state-of-the-art models detecting learners' engagement levels.

A General Model for Detecting Learner Engagement: Implementation and Evaluation

TL;DR

The paper addresses automatic detection of learner engagement in e-learning by introducing a general, temporally aware model that processes frame sequences from videos while preserving temporal order. It builds per-frame emotion features from a CNN-based emotion detector and uses an education-driven two-level Engaged/Disengaged labeling policy on the DAiSEE dataset, balancing data via undersampling. The approach achieves an accuracy of 68.57% on the adapted Eng-Confusion labeling, outperforming several prior, more complex methods while remaining lightweight and suitable for online use. This work demonstrates the value of education-specific labeling and targeted feature selection for efficient, real-time engagement monitoring in synchronous classrooms.

Abstract

Considering learner engagement has a mutual benefit for both learners and instructors. Instructors can help learners increase their attention, involvement, motivation, and interest. On the other hand, instructors can improve their instructional performance by evaluating the cumulative results of all learners and upgrading their training programs. This paper proposes a general, lightweight model for selecting and processing features to detect learners' engagement levels while preserving the sequential temporal relationship over time. During training and testing, we analyzed the videos from the publicly available DAiSEE dataset to capture the dynamic essence of learner engagement. We have also proposed an adaptation policy to find new labels that utilize the affective states of this dataset related to education, thereby improving the models' judgment. The suggested model achieves an accuracy of 68.57\% in a specific implementation and outperforms the studied state-of-the-art models detecting learners' engagement levels.
Paper Structure (13 sections, 5 equations, 2 figures, 8 tables)

This paper contains 13 sections, 5 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Proposed general model for learner engagement detection. (The face images are adapted from Ref43FER2013_Images)
  • Figure 2: Our specific implementation for the proposed model. (The face images are adapted from Ref17)