Involution Fused ConvNet for Classifying Eye-Tracking Patterns of Children with Autism Spectrum Disorder
Md. Farhadul Islam, Meem Arafat Manab, Joyanta Jyoti Mondal, Sarah Zabeen, Fardin Bin Rahman, Md. Zahidul Hasan, Farig Sadeque, Jannatun Noor
TL;DR
This work tackles the challenging task of diagnosing Autism Spectrum Disorder (ASD) from eye-tracking gaze patterns. It introduces a hybrid Involution-Convolution neural network, placing three involution layers before convolutional blocks to learn location-specific spatial cues in gaze data. Evaluated on two public datasets with augmentation, the model achieves near state-of-the-art accuracy (about 99.4% on Dataset 1 and ~96.8% on Dataset 2) while maintaining a remarkably small footprint (~1.36 MB). The results demonstrate that combining involution with convolution yields strong performance with a compact model, enabling potential edge deployment for ASD screening using eye-tracking data.
Abstract
Autism Spectrum Disorder (ASD) is a complicated neurological condition which is challenging to diagnose. Numerous studies demonstrate that children diagnosed with autism struggle with maintaining attention spans and have less focused vision. The eye-tracking technology has drawn special attention in the context of ASD since anomalies in gaze have long been acknowledged as a defining feature of autism in general. Deep Learning (DL) approaches coupled with eye-tracking sensors are exploiting additional capabilities to advance the diagnostic and its applications. By learning intricate nonlinear input-output relations, DL can accurately recognize the various gaze and eye-tracking patterns and adjust to the data. Convolutions alone are insufficient to capture the important spatial information in gaze patterns or eye tracking. The dynamic kernel-based process known as involutions can improve the efficiency of classifying gaze patterns or eye tracking data. In this paper, we utilise two different image-processing operations to see how these processes learn eye-tracking patterns. Since these patterns are primarily based on spatial information, we use involution with convolution making it a hybrid, which adds location-specific capability to a deep learning model. Our proposed model is implemented in a simple yet effective approach, which makes it easier for applying in real life. We investigate the reasons why our approach works well for classifying eye-tracking patterns. For comparative analysis, we experiment with two separate datasets as well as a combined version of both. The results show that IC with three involution layers outperforms the previous approaches.
