Table of Contents
Fetching ...

Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder

Marie Huynh, Aaron Kline, Saimourya Surabhi, Kaitlyn Dunlap, Onur Cezmi Mutlu, Mohammadmahdi Honarmand, Parnian Azizian, Peter Washington, Dennis P. Wall

TL;DR

The paper addresses early autism spectrum disorder detection using naturalistic home videos collected via the GuessWhat app. It proposes a multimodal time-series approach that extracts eye gaze, head pose, and facial landmarks, training LSTM/GRU models and applying late fusion to boost diagnostic accuracy. Key findings show individual modalities yield AUCs of up to 0.86 (eye gaze) and 0.78 (head pose), while late fusion achieves an AUC of 0.90 with improved fairness across age and gender groups. The work demonstrates a scalable, equity-aware framework for remote ASD phenotype characterization and points to future enhancements with additional modalities and better bias mitigation.

Abstract

Early detection of autism, a neurodevelopmental disorder marked by social communication challenges, is crucial for timely intervention. Recent advancements have utilized naturalistic home videos captured via the mobile application GuessWhat. Through interactive games played between children and their guardians, GuessWhat has amassed over 3,000 structured videos from 382 children, both diagnosed with and without Autism Spectrum Disorder (ASD). This collection provides a robust dataset for training computer vision models to detect ASD-related phenotypic markers, including variations in emotional expression, eye contact, and head movements. We have developed a protocol to curate high-quality videos from this dataset, forming a comprehensive training set. Utilizing this set, we trained individual LSTM-based models using eye gaze, head positions, and facial landmarks as input features, achieving test AUCs of 86%, 67%, and 78%, respectively. To boost diagnostic accuracy, we applied late fusion techniques to create ensemble models, improving the overall AUC to 90%. This approach also yielded more equitable results across different genders and age groups. Our methodology offers a significant step forward in the early detection of ASD by potentially reducing the reliance on subjective assessments and making early identification more accessibly and equitable.

Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder

TL;DR

The paper addresses early autism spectrum disorder detection using naturalistic home videos collected via the GuessWhat app. It proposes a multimodal time-series approach that extracts eye gaze, head pose, and facial landmarks, training LSTM/GRU models and applying late fusion to boost diagnostic accuracy. Key findings show individual modalities yield AUCs of up to 0.86 (eye gaze) and 0.78 (head pose), while late fusion achieves an AUC of 0.90 with improved fairness across age and gender groups. The work demonstrates a scalable, equity-aware framework for remote ASD phenotype characterization and points to future enhancements with additional modalities and better bias mitigation.

Abstract

Early detection of autism, a neurodevelopmental disorder marked by social communication challenges, is crucial for timely intervention. Recent advancements have utilized naturalistic home videos captured via the mobile application GuessWhat. Through interactive games played between children and their guardians, GuessWhat has amassed over 3,000 structured videos from 382 children, both diagnosed with and without Autism Spectrum Disorder (ASD). This collection provides a robust dataset for training computer vision models to detect ASD-related phenotypic markers, including variations in emotional expression, eye contact, and head movements. We have developed a protocol to curate high-quality videos from this dataset, forming a comprehensive training set. Utilizing this set, we trained individual LSTM-based models using eye gaze, head positions, and facial landmarks as input features, achieving test AUCs of 86%, 67%, and 78%, respectively. To boost diagnostic accuracy, we applied late fusion techniques to create ensemble models, improving the overall AUC to 90%. This approach also yielded more equitable results across different genders and age groups. Our methodology offers a significant step forward in the early detection of ASD by potentially reducing the reliance on subjective assessments and making early identification more accessibly and equitable.
Paper Structure (19 sections, 9 figures, 15 tables, 3 algorithms)

This paper contains 19 sections, 9 figures, 15 tables, 3 algorithms.

Figures (9)

  • Figure 1: Presence of Superusers in the ASD Class. Some children with ASD dominate the data with dozens of videos.
  • Figure 2: Key Filtering Steps
  • Figure 3: Feature Extraction Scheme. Every feature vector comes with a confidence score ranging from 0 to 100.
  • Figure 4: Key Feature Engineering Steps
  • Figure 5: Normalization and Missingness Examples for Head Features.
  • ...and 4 more figures