Table of Contents
Fetching ...

Explainable Artificial Intelligence for Quantifying Interfering and High-Risk Behaviors in Autism Spectrum Disorder in a Real-World Classroom Environment Using Privacy-Preserving Video Analysis

Barun Das, Conor Anderson, Tania Villavicencio, Johanna Lantz, Jenny Foster, Theresa Hamlin, Ali Bahrami Rad, Gari D. Clifford, Hyeokhyen Kwon

TL;DR

This work demonstrates a privacy-preserving, explainable AI framework for quantifying interfering and high-risk ASD behaviors in real-world classrooms using ambient video. By leveraging multi-person 2D pose estimation, Hungarian tracking, and hierarchical attention (body-joint, temporal, and person) within 4-second windows, the approach yields interpretable video-level representations that can detect target behaviors with a 77% F1-score for top-down camera views. Key findings show that a privacy-focused, pose-based analysis can operate in real classrooms and identify the most relevant individuals, though performance is constrained by data sparsity and camera viewpoint, with 3-minute predictive horizons proving substantially more challenging. The work lays groundwork for scalable, automated behavior monitoring in educational settings, offering a path toward longitudinal studies and reduced staff burden, while outlining concrete extensions like active learning, multi-modal data, and edge deployment to broaden applicability.

Abstract

Rapid identification and accurate documentation of interfering and high-risk behaviors in ASD, such as aggression, self-injury, disruption, and restricted repetitive behaviors, are important in daily classroom environments for tracking intervention effectiveness and allocating appropriate resources to manage care needs. However, having a staff dedicated solely to observing is costly and uncommon in most educational settings. Recently, multiple research studies have explored developing automated, continuous, and objective tools using machine learning models to quantify behaviors in ASD. However, the majority of the work was conducted under a controlled environment and has not been validated for real-world conditions. In this work, we demonstrate that the latest advances in video-based group activity recognition techniques can quantify behaviors in ASD in real-world activities in classroom environments while preserving privacy. Our explainable model could detect the episode of problem behaviors with a 77% F1-score and capture distinctive behavior features in different types of behaviors in ASD. To the best of our knowledge, this is the first work that shows the promise of objectively quantifying behaviors in ASD in a real-world environment, which is an important step toward the development of a practical tool that can ease the burden of data collection for classroom staff.

Explainable Artificial Intelligence for Quantifying Interfering and High-Risk Behaviors in Autism Spectrum Disorder in a Real-World Classroom Environment Using Privacy-Preserving Video Analysis

TL;DR

This work demonstrates a privacy-preserving, explainable AI framework for quantifying interfering and high-risk ASD behaviors in real-world classrooms using ambient video. By leveraging multi-person 2D pose estimation, Hungarian tracking, and hierarchical attention (body-joint, temporal, and person) within 4-second windows, the approach yields interpretable video-level representations that can detect target behaviors with a 77% F1-score for top-down camera views. Key findings show that a privacy-focused, pose-based analysis can operate in real classrooms and identify the most relevant individuals, though performance is constrained by data sparsity and camera viewpoint, with 3-minute predictive horizons proving substantially more challenging. The work lays groundwork for scalable, automated behavior monitoring in educational settings, offering a path toward longitudinal studies and reduced staff burden, while outlining concrete extensions like active learning, multi-modal data, and edge deployment to broaden applicability.

Abstract

Rapid identification and accurate documentation of interfering and high-risk behaviors in ASD, such as aggression, self-injury, disruption, and restricted repetitive behaviors, are important in daily classroom environments for tracking intervention effectiveness and allocating appropriate resources to manage care needs. However, having a staff dedicated solely to observing is costly and uncommon in most educational settings. Recently, multiple research studies have explored developing automated, continuous, and objective tools using machine learning models to quantify behaviors in ASD. However, the majority of the work was conducted under a controlled environment and has not been validated for real-world conditions. In this work, we demonstrate that the latest advances in video-based group activity recognition techniques can quantify behaviors in ASD in real-world activities in classroom environments while preserving privacy. Our explainable model could detect the episode of problem behaviors with a 77% F1-score and capture distinctive behavior features in different types of behaviors in ASD. To the best of our knowledge, this is the first work that shows the promise of objectively quantifying behaviors in ASD in a real-world environment, which is an important step toward the development of a practical tool that can ease the burden of data collection for classroom staff.
Paper Structure (21 sections, 7 figures, 2 tables)

This paper contains 21 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Two different camera views capturing the classroom table in our study.
  • Figure 2: Distribution of total duration (in minutes) target behaviors observed in our labeled dataset.
  • Figure 3: Distribution of behavioral episodes for each subject
  • Figure 4: Overall analysis pipeline for group activities in the classroom (Example with a top-down view). From the 4-second analysis window, 2D poses of subjects in the scene are captured using DEKR model geng2021bottom, and each subject's activities over the video clip are collected through multi-person tracking technique using Hungarian matching algorithm kuhn1955hungarian. The proposed method first identifies the important joint movements by processing the pose sequence using the Body Joint Attention model. The joint-weighted pose sequences are then processed with a Temporal Convolutional Network (TCN) to extract pose features across all frames. The TCN feature sequences are then processed with Temporal Attention model for temporal pooling with higher weights on the more relevant frames to identify atypical behaviors in the 4-second window. Temporally pooled features for each individual are then processed with Person Attention model to identify the subject that is most likely to exhibit atypical behaviors. The person-attention weights are used to aggregate all individuals' features to generate video-level features, which are used to identify the presence of target behavior in 4-second analysis windows.
  • Figure 5: Mean true positive rate (TPR) for different behaviors. TPR for each behavior is correlated with the availability of the sample belonging to each category from \ref{['fig:problem-behaviors']}
  • ...and 2 more figures