Table of Contents
Fetching ...

A Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors

Fan Yang

TL;DR

The paper addresses automatic detection of student classroom behaviors from videos, focusing on long-tail data challenges and multi-label scenarios. It proposes BDSTA, a spatio-temporal attention-based detector built on SlowFast and augmented with the 3D TCS3D module, plus an improved FBce loss to reweight tail classes. Key contributions include the STSCB dataset (8 behaviors, 24,628 frames), the TCS3D attention module (temporal, channel, spatial paths), and the FBce loss that blends BCE with focal loss. Empirically, FBce substantially improves mAP over SlowFast, and the addition of TCS3D yields further gains, suggesting practical value for classroom behavior analytics and pedagogy.

Abstract

Accurately detecting student behavior from classroom videos is beneficial for analyzing their classroom status and improving teaching efficiency. However, low accuracy in student classroom behavior detection is a prevalent issue. To address this issue, we propose a Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors (BDSTA). Firstly, the SlowFast network is used to generate motion and environmental information feature maps from the video. Then, the spatio-temporal attention module is applied to the feature maps, including information aggregation, compression and stimulation processes. Subsequently, attention maps in the time, channel and space dimensions are obtained, and multi-label behavior classification is performed based on these attention maps. To solve the long-tail data problem that exists in student classroom behavior datasets, we use an improved focal loss function to assign more weight to the tail class data during training. Experimental results are conducted on a self-made student classroom behavior dataset named STSCB. Compared with the SlowFast model, the average accuracy of student behavior classification detection improves by 8.94\% using BDSTA.

A Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors

TL;DR

The paper addresses automatic detection of student classroom behaviors from videos, focusing on long-tail data challenges and multi-label scenarios. It proposes BDSTA, a spatio-temporal attention-based detector built on SlowFast and augmented with the 3D TCS3D module, plus an improved FBce loss to reweight tail classes. Key contributions include the STSCB dataset (8 behaviors, 24,628 frames), the TCS3D attention module (temporal, channel, spatial paths), and the FBce loss that blends BCE with focal loss. Empirically, FBce substantially improves mAP over SlowFast, and the addition of TCS3D yields further gains, suggesting practical value for classroom behavior analytics and pedagogy.

Abstract

Accurately detecting student behavior from classroom videos is beneficial for analyzing their classroom status and improving teaching efficiency. However, low accuracy in student classroom behavior detection is a prevalent issue. To address this issue, we propose a Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors (BDSTA). Firstly, the SlowFast network is used to generate motion and environmental information feature maps from the video. Then, the spatio-temporal attention module is applied to the feature maps, including information aggregation, compression and stimulation processes. Subsequently, attention maps in the time, channel and space dimensions are obtained, and multi-label behavior classification is performed based on these attention maps. To solve the long-tail data problem that exists in student classroom behavior datasets, we use an improved focal loss function to assign more weight to the tail class data during training. Experimental results are conducted on a self-made student classroom behavior dataset named STSCB. Compared with the SlowFast model, the average accuracy of student behavior classification detection improves by 8.94\% using BDSTA.
Paper Structure (12 sections, 13 equations, 11 figures, 5 tables)

This paper contains 12 sections, 13 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: TCS3D Temporal, Channel, Spatial Convolution Attention Module.
  • Figure 2: The network structure of the student classroom behavior detection based on spatio-temporal attention.
  • Figure 3: Number of labels for each class in the dataset.
  • Figure 4: 3D Temporal Attention Module.
  • Figure 5: 3D Channel Attention Module.
  • ...and 6 more figures