Table of Contents
Fetching ...

Early Autism Diagnosis based on Path Signature and Siamese Unsupervised Feature Compressor

Zhuowen Yin, Xinyao Ding, Xin Zhang, Zhengwang Wu, Li Wang, Xiangmin Xu, Gang Li

TL;DR

This work tackles early autism diagnosis from infant structural MRI in contexts of scarce and imbalanced data. It introduces a three-part pipeline: Path Signature–based longitudinal feature extraction, a dual-channel unsupervised feature compressor with feature binarization, and a multi-task Siamese verification framework with region-weighted voting and interpretability through region importance analysis. On the NDAR/IBIS infant dataset, the approach improves recall and F1 scores over baselines and identifies anatomically plausible regions (e.g., left superior frontal gyrus, caudal anterior-cingulate, STS/STG) as predictive, offering both predictive power and developmental insights. The methodology provides a practical, data-efficient path toward early ASD screening from MRI and sets the stage for future multimodal and graph-based extensions.

Abstract

Autism Spectrum Disorder (ASD) has been emerging as a growing public health threat. Early diagnosis of ASD is crucial for timely, effective intervention and treatment. However, conventional diagnosis methods based on communications and behavioral patterns are unreliable for children younger than 2 years of age. Given evidences of neurodevelopmental abnormalities in ASD infants, we resort to a novel deep learning-based method to extract key features from the inherently scarce, class-imbalanced, and heterogeneous structural MR images for early autism diagnosis. Specifically, we propose a Siamese verification framework to extend the scarce data, and an unsupervised compressor to alleviate data imbalance by extracting key features. We also proposed weight constraints to cope with sample heterogeneity by giving different samples different voting weights during validation, and we used Path Signature to unravel meaningful developmental features from the two-time point data longitudinally. We further extracted machine learning focused brain regions for autism diagnosis. Extensive experiments have shown that our method performed well under practical scenarios, transcending existing machine learning methods and providing anatomical insights for autism early diagnosis.

Early Autism Diagnosis based on Path Signature and Siamese Unsupervised Feature Compressor

TL;DR

This work tackles early autism diagnosis from infant structural MRI in contexts of scarce and imbalanced data. It introduces a three-part pipeline: Path Signature–based longitudinal feature extraction, a dual-channel unsupervised feature compressor with feature binarization, and a multi-task Siamese verification framework with region-weighted voting and interpretability through region importance analysis. On the NDAR/IBIS infant dataset, the approach improves recall and F1 scores over baselines and identifies anatomically plausible regions (e.g., left superior frontal gyrus, caudal anterior-cingulate, STS/STG) as predictive, offering both predictive power and developmental insights. The methodology provides a practical, data-efficient path toward early ASD screening from MRI and sets the stage for future multimodal and graph-based extensions.

Abstract

Autism Spectrum Disorder (ASD) has been emerging as a growing public health threat. Early diagnosis of ASD is crucial for timely, effective intervention and treatment. However, conventional diagnosis methods based on communications and behavioral patterns are unreliable for children younger than 2 years of age. Given evidences of neurodevelopmental abnormalities in ASD infants, we resort to a novel deep learning-based method to extract key features from the inherently scarce, class-imbalanced, and heterogeneous structural MR images for early autism diagnosis. Specifically, we propose a Siamese verification framework to extend the scarce data, and an unsupervised compressor to alleviate data imbalance by extracting key features. We also proposed weight constraints to cope with sample heterogeneity by giving different samples different voting weights during validation, and we used Path Signature to unravel meaningful developmental features from the two-time point data longitudinally. We further extracted machine learning focused brain regions for autism diagnosis. Extensive experiments have shown that our method performed well under practical scenarios, transcending existing machine learning methods and providing anatomical insights for autism early diagnosis.
Paper Structure (22 sections, 13 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 13 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) The whole algorithm flow of the training process, which is divided into three parts: feature extraction, feature compression, and Siamese verification. In the feature extraction part, we extract longitudinal Path Signature of the features and concatenate them with the original morphological features. In the feature compression part, we train stacked dual auto-encoders (the Compressor) to compress and exploit useful features. In the Siamese verification part, we train a multi-task learning-based Siamese verification model, to ensure the validity of classification results. (b) The detailed structure of concatenating input features with PS features.
  • Figure 2: Schematic diagram of the autoencoder training process introduced in Algorithm 2. We train layers 1 and 4 first, then freeze them, and insert layers 2 and 3 to train. We ultimately obtain layers 1 and 2 as the feature Compressor for further training, which outputs compressed low-dimensional feature vectors.
  • Figure 3: The schematic diagram of the multi-task siamese verification model during training. Feature names are listed on the right side of the figure.
  • Figure 4: Test phase of the Siamese verification model. The test similarity result is obtained by verification with all training samples and the voting afterwards. Classification modules are only used in training phase to calculate the integrated loss and are not utilized in test phase.
  • Figure 5: The importance iteration for the $i$-th layer. Step 1: calculate $w_{i,j,k}'$ with the importance factor $I_{i,j}$ and set the value of those below the median to 0. Step 2: calculate the previous layer's importance factor $I_{i-1,k}$ with the weighted sum of $I_{i,j}$ and $w_{i,j,k}'$.
  • ...and 2 more figures