Physics Augmented Tuple Transformer for Autism Severity Level Detection

Chinthaka Ranasingha; Harshala Gammulle; Tharindu Fernando; Sridha Sridharan; Clinton Fookes

Physics Augmented Tuple Transformer for Autism Severity Level Detection

Chinthaka Ranasingha, Harshala Gammulle, Tharindu Fernando, Sridha Sridharan, Clinton Fookes

TL;DR

A novel framework that exploits the laws of physics for ASD severity recognition is proposed that attains state-of-the-art performance on multiple ASD diagnosis benchmarks and is demonstrated to have utility beyond the task ASD diagnosis.

Abstract

Early diagnosis of Autism Spectrum Disorder (ASD) is an effective and favorable step towards enhancing the health and well-being of children with ASD. Manual ASD diagnosis testing is labor-intensive, complex, and prone to human error due to several factors contaminating the results. This paper proposes a novel framework that exploits the laws of physics for ASD severity recognition. The proposed physics-informed neural network architecture encodes the behaviour of the subject extracted by observing a part of the skeleton-based motion trajectory in a higher dimensional latent space. Two decoders, namely physics-based and non-physics-based decoder, use this latent embedding and predict the future motion patterns. The physics branch leverages the laws of physics that apply to a skeleton sequence in the prediction process while the non-physics-based branch is optimised to minimise the difference between the predicted and actual motion of the subject. A classifier also leverages the same latent space embeddings to recognise the ASD severity. This dual generative objective explicitly forces the network to compare the actual behaviour of the subject with the general normal behaviour of children that are governed by the laws of physics, aiding the ASD recognition task. The proposed method attains state-of-the-art performance on multiple ASD diagnosis benchmarks. To illustrate the utility of the proposed framework beyond the task ASD diagnosis, we conduct a third experiment using a publicly available benchmark for the task of fall prediction and demonstrate the superiority of our model.

Physics Augmented Tuple Transformer for Autism Severity Level Detection

TL;DR

Abstract

Paper Structure (19 sections, 22 equations, 7 figures, 5 tables)

This paper contains 19 sections, 22 equations, 7 figures, 5 tables.

Introduction
Related Work
Automated ADOS Prediction Approaches
Transformer-based Skeleton Data Analysis
Methodology
Overall framework
Encoding the Skeleton Sequence and Action Information
Physics-based decoder
Non-Physics-based Decoder
ADOS-classifier
Loss Functions
Experiments
Datasets
Implementation Details
Evaluation Metrics
...and 4 more sections

Figures (7)

Figure 1: Illustration of Physics Augmented Tuple Transformer (PATT) framework. It contains a transformer encoder, a classifier for Autism Severity Level Detection and two decoders namely non-physics decoder and physics decoder to generate future physical representations. Input to our framework is the type of the action that the subject is performing and the observed skeleton sequence (1..t). Encoder encodes this information as latent position and force embedding. Non-physics decoder generate future skeleton sequence (t+1..T) by leveraging only the encoded positions. Physics decoder also generates future skeleton sequence (t+1..T) by leveraging both encoded positions and forces. When providing ground truths to as a supervision signal the non-physics decoder receives subject's skeleton sequence which contains different ASD levels. However the physics decoder receives the TD sequence of the same action performed by the therapist. As such our network gains the ability to compare the anomalous ASD and TD behaviour and identify the autism levels. Our physics decoder helps this normal-abnormal comparisons by decoding the future normal behavior of the TD subject utilising the laws of physics.
Figure 2: Overall framework of Physics-Augmented Tuple Transformer (PATT). The inputs to the model are the skeleton sequences and the relevant action classes. Initially, the STTFormer-based sttformer encoder takes the input and predicts the joint positions (P) and forces (F). Then, the physics-based decoder takes both the generalized positions and forces to predict the next state of the skeleton sequence based on the Lagrangian dynamics, while the non-physics-based decoder takes only generalized positions to generate the next state of the skeleton sequence. The decoders are discarded during the inference and using both generalized positions and forces a simple feed-forward neural network performs the ADOS score prediction.
Figure 3: Self-attention scheme that simultaneously captures the relationship between every joint in multiple successive frames.
Figure 4: Spatial-temporal tuples encoding module.
Figure 5: Comparison of classification accuracy of baseline LSTM model and the proposed PATT model for different ADOS levels on the MMASD mmasd dataset.
...and 2 more figures

Physics Augmented Tuple Transformer for Autism Severity Level Detection

TL;DR

Abstract

Physics Augmented Tuple Transformer for Autism Severity Level Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)