SAND Challenge: Four Approaches for Dysartria Severity Classification
Gauri Deshpande, Harish Battula, Ashish Panda, Sunil Kumar Kopparapu
TL;DR
The paper compares four distinct approaches to 5-class dysarthria severity classification in the SAND Task #1 using the same dataset and utterance set. It finds that a feature-based hierarchical XGBoost pipeline, leveraging glottal and formant features with a two-stage cascade, delivers the strongest macro-F1 (~0.86), while deep learning variants (ViT-Ave, 1D-CNN, BiLSTM-of) achieve competitive macro-F1 scores (~0.68–0.70) and provide complementary insights into speech impairment. The study highlights the benefits of domain knowledge and tailored fusion strategies in low-data regimes, and suggests potential for hybrid models that fuse engineered features with neural representations. Overall, the results demonstrate that combining diverse strategies—end-to-end learning and expert-feature methods—offers robust dysarthria classification under challenging data conditions, with clear avenues for future improvement through hybrids and larger datasets.
Abstract
This paper presents a unified study of four distinct modeling approaches for classifying dysarthria severity in the Speech Analysis for Neurodegenerative Diseases (SAND) challenge. All models tackle the same five class classification task using a common dataset of speech recordings. We investigate: (1) a ViT-OF method leveraging a Vision Transformer on spectrogram images, (2) a 1D-CNN approach using eight 1-D CNN's with majority-vote fusion, (3) a BiLSTM-OF approach using nine BiLSTM models with majority vote fusion, and (4) a Hierarchical XGBoost ensemble that combines glottal and formant features through a two stage learning framework. Each method is described, and their performances on a validation set of 53 speakers are compared. Results show that while the feature-engineered XGBoost ensemble achieves the highest macro-F1 (0.86), the deep learning models (ViT, CNN, BiLSTM) attain competitive F1-scores (0.70) and offer complementary insights into the problem.
