Detecting Children with Autism Spectrum Disorder based on Script-Centric Behavior Understanding with Emotional Enhancement

Wenxing Liu; Yueran Pan; Dong Zhang; Hongzhu Deng; Xiaobing Zou; Ming Li

Detecting Children with Autism Spectrum Disorder based on Script-Centric Behavior Understanding with Emotional Enhancement

Wenxing Liu, Yueran Pan, Dong Zhang, Hongzhu Deng, Xiaobing Zou, Ming Li

TL;DR

The paper tackles early ASD detection from limited audio-visual data by converting videos into textual behavior scripts and leveraging large language models in zero-shot and few-shot settings. It introduces a three-module SCBU pipeline—Behavior Transcription, Script Transcription, and emotion-aware domain prompting—with multi-LLM collaboration to achieve high diagnostic performance and interpretable rationales. Key contributions include the script transcription framework, emotion textualization of emotional dynamics, and a domain-prompting strategy that injects clinical ASD knowledge. Experimental results on a toddler-age ASD dataset show state-of-the-art zero-shot and few-shot performance (F1 up to 95.24%) and provide explainable detection rationales, suggesting practical potential for clinical screening and decision support.

Abstract

The early diagnosis of autism spectrum disorder (ASD) is critically dependent on systematic observation and analysis of children's social behaviors. While current methodologies predominantly utilize supervised learning approaches, their clinical adoption faces two principal limitations: insufficient ASD diagnostic samples and inadequate interpretability of the detection outcomes. This paper presents a novel zero-shot ASD detection framework based on script-centric behavioral understanding with emotional enhancement, which is designed to overcome the aforementioned clinical constraints. The proposed pipeline automatically converts audio-visual data into structured behavioral text scripts through computer vision techniques, subsequently capitalizing on the generalization capabilities of large language models (LLMs) for zero-shot/few-shot ASD detection. Three core technical contributions are introduced: (1) A multimodal script transcription module transforming behavioral cues into structured textual representations. (2) An emotion textualization module encoding emotional dynamics as the contextual features to augment behavioral understanding. (3) A domain-specific prompt engineering strategy enables the injection of clinical knowledge into LLMs. Our method achieves an F1-score of 95.24\% in diagnosing ASD in children with an average age of two years while generating interpretable detection rationales. This work opens up new avenues for leveraging the power of LLMs in analyzing and understanding ASD-related human behavior, thereby enhancing the accuracy of assisted autism diagnosis.

Detecting Children with Autism Spectrum Disorder based on Script-Centric Behavior Understanding with Emotional Enhancement

TL;DR

Abstract

Detecting Children with Autism Spectrum Disorder based on Script-Centric Behavior Understanding with Emotional Enhancement

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)