Learning a Distance for the Clustering of Patients with Amyotrophic Lateral Sclerosis
Guillaume Tejedor, Veronika Peralta, Nicolas Labroche, Patrick Marcel, Hélène Blasco, Hugo Alarcan
TL;DR
This work addresses the challenge of clustering ALS patients in a way that reflects clinically meaningful disease progression from longitudinal ALSFRS-R sequences. It introduces a framework that extracts descriptive progression variables, combines off-the-shelf distances with a weakly supervised distance learned from labeling functions, and couples these distances with standard clustering methods (with optional UMAP) to identify patient subgroups. Evaluated on 353 Tours-dataset patients, the approach achieves survival-analysis gains over state-of-the-art methods while maintaining competitive internal clustering metrics, and provides distance measures that improve interpretability for clinicians. Overall, the method offers a practical path toward clinically relevant, interpretable patient stratification and lays groundwork for generalizing to other progressive diseases and larger external datasets.
Abstract
Amyotrophic lateral sclerosis (ALS) is a severe disease with a typical survival of 3-5 years after symptom onset. Current treatments offer only limited life extension, and the variability in patient responses highlights the need for personalized care. However, research is hindered by small, heterogeneous cohorts, sparse longitudinal data, and the lack of a clear definition for clinically meaningful patient clusters. Existing clustering methods remain limited in both scope and number. To address this, we propose a clustering approach that groups sequences using a disease progression declarative score. Our approach integrates medical expertise through multiple descriptive variables, investigating several distance measures combining such variables, both by reusing off-the-shelf distances and employing a weak-supervised learning method. We pair these distances with clustering methods and benchmark them against state-of-the-art techniques. The evaluation of our approach on a dataset of 353 ALS patients from the University Hospital of Tours, shows that our method outperforms state-of-the-art methods in survival analysis while achieving comparable silhouette scores. In addition, the learned distances enhance the relevance and interpretability of results for medical experts.
