Table of Contents
Fetching ...

Weakly-supervised Autism Severity Assessment in Long Videos

Abid Ali, Mahmoud Ali, Jean-Marc Odobez, Camilla Barbini, Séverine Dubuisson, Francois Bremond, Susanne Thümmler

TL;DR

This paper tackles autism severity assessment from long, untrimmed videos under weak supervision by learning typical versus atypical behavioral biomarkers. It introduces a three-stage architecture: a Visual Encoder using VideoMAE-v2/DinoV2 features, a WTAL-based Outlier Embedder and Cross-Temporal Scale Transformer with a Detector to identify ASD segments, and a shallow TCN-MLP severity regressor that maps learned biomarkers to ADOS-based severity scores. The approach achieves superior discrimination of typical vs ASD patterns and provides automatic severity estimates on real-world clinical data, outperforming several TAL baselines. By leveraging weak supervision and long-video biomarkers, the method offers a scalable, non-invasive aid for clinicians in early ASD detection and ongoing assessment. This work thus promotes objective, multi-biomarker analysis from untrimmed videos with potential for broader clinical impact.

Abstract

Autism Spectrum Disorder (ASD) is a diverse collection of neurobiological conditions marked by challenges in social communication and reciprocal interactions, as well as repetitive and stereotypical behaviors. Atypical behavior patterns in a long, untrimmed video can serve as biomarkers for children with ASD. In this paper, we propose a video-based weakly-supervised method that takes spatio-temporal features of long videos to learn typical and atypical behaviors for autism detection. On top of that, we propose a shallow TCN-MLP network, which is designed to further categorize the severity score. We evaluate our method on actual evaluation videos of children with autism collected and annotated (for severity score) by clinical professionals. Experimental results demonstrate the effectiveness of behavioral biomarkers that could help clinicians in autism spectrum analysis.

Weakly-supervised Autism Severity Assessment in Long Videos

TL;DR

This paper tackles autism severity assessment from long, untrimmed videos under weak supervision by learning typical versus atypical behavioral biomarkers. It introduces a three-stage architecture: a Visual Encoder using VideoMAE-v2/DinoV2 features, a WTAL-based Outlier Embedder and Cross-Temporal Scale Transformer with a Detector to identify ASD segments, and a shallow TCN-MLP severity regressor that maps learned biomarkers to ADOS-based severity scores. The approach achieves superior discrimination of typical vs ASD patterns and provides automatic severity estimates on real-world clinical data, outperforming several TAL baselines. By leveraging weak supervision and long-video biomarkers, the method offers a scalable, non-invasive aid for clinicians in early ASD detection and ongoing assessment. This work thus promotes objective, multi-biomarker analysis from untrimmed videos with potential for broader clinical impact.

Abstract

Autism Spectrum Disorder (ASD) is a diverse collection of neurobiological conditions marked by challenges in social communication and reciprocal interactions, as well as repetitive and stereotypical behaviors. Atypical behavior patterns in a long, untrimmed video can serve as biomarkers for children with ASD. In this paper, we propose a video-based weakly-supervised method that takes spatio-temporal features of long videos to learn typical and atypical behaviors for autism detection. On top of that, we propose a shallow TCN-MLP network, which is designed to further categorize the severity score. We evaluate our method on actual evaluation videos of children with autism collected and annotated (for severity score) by clinical professionals. Experimental results demonstrate the effectiveness of behavioral biomarkers that could help clinicians in autism spectrum analysis.
Paper Structure (17 sections, 3 equations, 3 figures, 3 tables)

This paper contains 17 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The network comprises three major stages i.e. (A) Visual Encoder, (B) Weakly-supervised ASD Detector to detect typical and atypical behaviors, (C) Severity Score Regressor to further regress the final severity score. Here, $F_O$ = feature map of one-class, $F_M$ = feature map of mixed distribution, $T$ = 32 temporal segments, $D$ = 1408 features, $128$ is the feature vector from detector final layer. $nm$ is the $m$ video features obtained from $n$-levels of CTST module.
  • Figure 2: Analysis of WTAL $T \times D$ features for 4 randomly selected participants from each level, where $T = 16$ and $D = 128$ (feature vector). The density of the heatmap defines the atypical biomarkers. A higher density on the heatmap corresponds to a higher severity score.
  • Figure 3: Confusion matrix for severity score assessment.