Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data
Ritesh Mehta, Aleksandar Pramov, Shashank Verma
TL;DR
The paper investigates predicting ALSFRS-R progression using sensor data from a patient app within the iDPP@CLEF 2024 challenge. It compares a naive baseline, ElasticNet regression, and an LSTM, aided by data augmentation via Task2 and thorough nested grouped cross-validation to handle a small, high-dimensional dataset. A key finding is that the previous ALSFRS-R value is the strongest predictor, with the naive baseline often matching or outperforming more complex models; ElasticNet provides interpretability for feature contributions. The study highlights the potential of sensor data while acknowledging limitations due to dataset size, and it outlines routes for improvement through larger, heterogeneous, and multimodal datasets.
Abstract
Amyotrophic Lateral Sclerosis (ALS) is characterized as a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options in the realm of medical interventions and therapies. The disease showcases a diverse range of onset patterns and progression trajectories, emphasizing the critical importance of early detection of functional decline to enable tailored care strategies and timely therapeutic interventions. The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app. This data is used to construct various machine learning models specifically designed to forecast the advancement of the ALS Functional Rating Scale-Revised (ALSFRS-R) score, leveraging the dataset provided by the organizers. In our analysis, multiple predictive models were evaluated to determine their efficacy in handling ALS sensor data. The temporal aspect of the sensor data was compressed and amalgamated using statistical methods, thereby augmenting the interpretability and applicability of the gathered information for predictive modeling objectives. The models that demonstrated optimal performance were a naive baseline and ElasticNet regression. The naive model achieved a Mean Absolute Error (MAE) of 0.20 and a Root Mean Square Error (RMSE) of 0.49, slightly outperforming the ElasticNet model, which recorded an MAE of 0.22 and an RMSE of 0.50. Our comparative analysis suggests that while the naive approach yielded marginally better predictive accuracy, the ElasticNet model provides a robust framework for understanding feature contributions.
