Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
Liming Wang, Yuan Gong, Nauman Dawalatabad, Marco Vilela, Katerina Placek, Brian Tracey, Yishu Gong, Alan Premasiri, Fernando Vieira, James Glass
TL;DR
This work tackles automatic prediction of ALS progression from longitudinal speech by introducing ALST, a transformer-based model that fuses pretrained speech representations with longitudinal context to forecast the $ALSFRS\text{-}R$ trajectory. The architecture combines a pretrained feature extractor (wav2vec 2.0 or Whisper), phoneme alignment, a longitudinal transformer encoder, and an $ALSFRS\text{-}R$ scorer, trained with a joint $L_{ALST}$ objective that blends regression and classification signals. On the ALS TDI dataset, ALST achieves a 91.0% (0.910) AUC, surpassing prior approaches by about 5.6% relative, and exhibits strong rank-based performance particularly for longitudinal progression; ablations show longitudinal information improves ranking metrics and that the choice of pretrained encoder and loss balancing impacts results. The model offers interpretable, fine-grained progression estimates and provides a practical tool for scalable, patient-specific monitoring, with code publicly available for replication and extension.
Abstract
Automatic prediction of amyotrophic lateral sclerosis (ALS) disease progression provides a more efficient and objective alternative than manual approaches. We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predictor of ALS disease progression from longitudinal speech recordings of ALS patients. By taking advantage of high-quality pretrained speech features and longitudinal information in the recordings, our best model achieves 91.0\% AUC, improving upon the previous best model by 5.6\% relative on the ALS TDI dataset. Careful analysis reveals that ALST is capable of fine-grained and interpretable predictions of ALS progression, especially for distinguishing between rarer and more severe cases. Code is publicly available.
