Table of Contents
Fetching ...

Measuring Student Behavioral Engagement using Histogram of Actions

Ahmed Abdelkawy, Aly Farag, Islam Alkabbany, Asem Ali, Chris Foreman, Thomas Tretter, Nicholas Hindy

TL;DR

This paper addresses the challenge of measuring classroom behavioral engagement by automatically recognizing student actions from upper-body skeleton heatmaps with a 3D-CNN, constructing a histogram of action frequencies over 2-minute segments, and predicting engagement using a gaze-augmented feature fed into a random forest. It introduces an Actions Dictionary, applies transfer learning via PoseConv3D for classroom action recognition, and uses kernel density estimation to synthesize disengaged samples to balance the training data. The authors provide a new public-style dataset with 1414 action-annotated segments (13 actions) and 112 engagement-annotated segments, achieving top-1 action accuracy of 86.32% and a weighted F1 around 0.90 for engagement when using synthesized samples. Overall, the method yields interpretable, scalable engagement estimates and demonstrates noticeable improvement over prior baselines, offering practical potential for instructor feedback and classroom analytics.

Abstract

In this paper, we propose a novel technique for measuring behavioral engagement through students' actions recognition. The proposed approach recognizes student actions then predicts the student behavioral engagement level. For student action recognition, we use human skeletons to model student postures and upper body movements. To learn the dynamics of student upper body, a 3D-CNN model is used. The trained 3D-CNN model is used to recognize actions within every 2minute video segment then these actions are used to build a histogram of actions which encodes the student actions and their frequencies. This histogram is utilized as an input to SVM classifier to classify whether the student is engaged or disengaged. To evaluate the proposed framework, we build a dataset consisting of 1414 2-minute video segments annotated with 13 actions and 112 video segments annotated with two engagement levels. Experimental results indicate that student actions can be recognized with top 1 accuracy 83.63% and the proposed framework can capture the average engagement of the class.

Measuring Student Behavioral Engagement using Histogram of Actions

TL;DR

This paper addresses the challenge of measuring classroom behavioral engagement by automatically recognizing student actions from upper-body skeleton heatmaps with a 3D-CNN, constructing a histogram of action frequencies over 2-minute segments, and predicting engagement using a gaze-augmented feature fed into a random forest. It introduces an Actions Dictionary, applies transfer learning via PoseConv3D for classroom action recognition, and uses kernel density estimation to synthesize disengaged samples to balance the training data. The authors provide a new public-style dataset with 1414 action-annotated segments (13 actions) and 112 engagement-annotated segments, achieving top-1 action accuracy of 86.32% and a weighted F1 around 0.90 for engagement when using synthesized samples. Overall, the method yields interpretable, scalable engagement estimates and demonstrates noticeable improvement over prior baselines, offering practical potential for instructor feedback and classroom analytics.

Abstract

In this paper, we propose a novel technique for measuring behavioral engagement through students' actions recognition. The proposed approach recognizes student actions then predicts the student behavioral engagement level. For student action recognition, we use human skeletons to model student postures and upper body movements. To learn the dynamics of student upper body, a 3D-CNN model is used. The trained 3D-CNN model is used to recognize actions within every 2minute video segment then these actions are used to build a histogram of actions which encodes the student actions and their frequencies. This histogram is utilized as an input to SVM classifier to classify whether the student is engaged or disengaged. To evaluate the proposed framework, we build a dataset consisting of 1414 2-minute video segments annotated with 13 actions and 112 video segments annotated with two engagement levels. Experimental results indicate that student actions can be recognized with top 1 accuracy 83.63% and the proposed framework can capture the average engagement of the class.
Paper Structure (18 sections, 1 equation, 7 figures, 7 tables)

This paper contains 18 sections, 1 equation, 7 figures, 7 tables.

Figures (7)

  • Figure 1: The proposed framework for measuring student engagement. First, a 3D-CNN model is used to classify actions in each 2-minute video segment. Also, head poses relative to the target (e.g., blackboard) are estimated. Then the histogram of actions is computed. Finally, a classifier determines the student engagement level based on the extracted histogram of actions. IoU denotes intersection over union.
  • Figure 2: Dictionary of the proposed tokens.
  • Figure 3: Examples of student's actions in a classroom.
  • Figure 4: The 3D-CNN architecture for action recognition. Given a student video clip, skeleton joints in each frame are extracted, and then a volume of 3D pseudo heatmaps is generated by stacking the upper body of student skeleton joints heatmaps along the temporal dimension. Finally, such volume is used as an input for the 3D-CNN model that consists of 5 layers to recognize the containing action.
  • Figure 5: Examples of student behavior engagement features. (a) and (b) are engaged samples while (c) and (d) are disengaged samples. x-axis represents the action index and y-axis represents the normalized frequency of such action. The actions are 0: supporting head, 1: writing, 2: typing on a keyboard, 3: playing with phone/tablet, 4: reading, 5: raising hands, 6: cross arms, 7: wipe face, 8: drink water, 9: eat meal/snack, 10: check time, 11: fiddling with hair, 12: yawn 13:look off target, 14:look at target.
  • ...and 2 more figures