Table of Contents
Fetching ...

Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

Liming Wang, Yuan Gong, Nauman Dawalatabad, Marco Vilela, Katerina Placek, Brian Tracey, Yishu Gong, Alan Premasiri, Fernando Vieira, James Glass

TL;DR

This work tackles automatic prediction of ALS progression from longitudinal speech by introducing ALST, a transformer-based model that fuses pretrained speech representations with longitudinal context to forecast the $ALSFRS\text{-}R$ trajectory. The architecture combines a pretrained feature extractor (wav2vec 2.0 or Whisper), phoneme alignment, a longitudinal transformer encoder, and an $ALSFRS\text{-}R$ scorer, trained with a joint $L_{ALST}$ objective that blends regression and classification signals. On the ALS TDI dataset, ALST achieves a 91.0% (0.910) AUC, surpassing prior approaches by about 5.6% relative, and exhibits strong rank-based performance particularly for longitudinal progression; ablations show longitudinal information improves ranking metrics and that the choice of pretrained encoder and loss balancing impacts results. The model offers interpretable, fine-grained progression estimates and provides a practical tool for scalable, patient-specific monitoring, with code publicly available for replication and extension.

Abstract

Automatic prediction of amyotrophic lateral sclerosis (ALS) disease progression provides a more efficient and objective alternative than manual approaches. We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predictor of ALS disease progression from longitudinal speech recordings of ALS patients. By taking advantage of high-quality pretrained speech features and longitudinal information in the recordings, our best model achieves 91.0\% AUC, improving upon the previous best model by 5.6\% relative on the ALS TDI dataset. Careful analysis reveals that ALST is capable of fine-grained and interpretable predictions of ALS progression, especially for distinguishing between rarer and more severe cases. Code is publicly available.

Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

TL;DR

This work tackles automatic prediction of ALS progression from longitudinal speech by introducing ALST, a transformer-based model that fuses pretrained speech representations with longitudinal context to forecast the trajectory. The architecture combines a pretrained feature extractor (wav2vec 2.0 or Whisper), phoneme alignment, a longitudinal transformer encoder, and an scorer, trained with a joint objective that blends regression and classification signals. On the ALS TDI dataset, ALST achieves a 91.0% (0.910) AUC, surpassing prior approaches by about 5.6% relative, and exhibits strong rank-based performance particularly for longitudinal progression; ablations show longitudinal information improves ranking metrics and that the choice of pretrained encoder and loss balancing impacts results. The model offers interpretable, fine-grained progression estimates and provides a practical tool for scalable, patient-specific monitoring, with code publicly available for replication and extension.

Abstract

Automatic prediction of amyotrophic lateral sclerosis (ALS) disease progression provides a more efficient and objective alternative than manual approaches. We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predictor of ALS disease progression from longitudinal speech recordings of ALS patients. By taking advantage of high-quality pretrained speech features and longitudinal information in the recordings, our best model achieves 91.0\% AUC, improving upon the previous best model by 5.6\% relative on the ALS TDI dataset. Careful analysis reveals that ALST is capable of fine-grained and interpretable predictions of ALS progression, especially for distinguishing between rarer and more severe cases. Code is publicly available.
Paper Structure (10 sections, 6 equations, 3 figures, 3 tables)

This paper contains 10 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Architecture of the proposed ALST. The Whisper encoder is frozen during training.
  • Figure 2: (a) Confusion matrix of ALST (R) trained using whisper-medium speech encoders (layer 22); (b) Effect of $\lambda_{\text{CE}}$ on ALSFRS-R speech score prediction performance; (c) Single-phoneme ALS prediction F1 vs phoneme used for the ALST models with different pretrained speech representations. We merge results for the phonemes 'AH0' and 'UW0' due to the multiple possible pronunciations in the second phoneme of the word 'today'.
  • Figure 3: Score prediction performance vs. layer used for feature extraction of the pretrained speech representation models