Interpretable Features for the Assessment of Neurodegenerative Diseases through Handwriting Analysis
Thomas Thebaud, Anna Favaro, Casey Chen, Gabrielle Chavez, Laureano Moro-Velazquez, Ankur Butala, Najim Dehak
TL;DR
This study investigates handwriting as a digital biomarker for neurodegenerative diseases by collecting 14 tablet-based tasks from 113 participants (including AD, PD, PDM, and CTL) and extracting 76 interpretable features (54 task-agnostic and 22 task-specific). Statistical analyses (Kruskal-Wallis with Dunn post hoc and FDR correction) and binary classifications (Bagging, RF, MLP with nested cross-validation) reveal that task-agnostic features largely differentiate ND groups from controls, with the strongest signals in entropy, duration, and velocity, particularly for AD*. Task-specific features further highlight disease patterns in spirals, writing, and drawing tasks, enabling up to 87% accuracy for AD* vs CTL and up to 69% for PD vs CTL in certain task groups. Correlations show a strong relationship between handwriting features and cognitive function (MoCA), while associations with motor severity (UPDRS-III) are more variable across subdomains. The work introduces a rich, interpretable handwriting dataset and demonstrates the potential for scalable digital biomarkers, while acknowledging limitations related to generalizability, hardware, and the need for multimodal integration for robust clinical translation.
Abstract
Motor dysfunction is a common sign of neurodegenerative diseases (NDs) such as Parkinson's disease (PD) and Alzheimer's disease (AD), but may be difficult to detect, especially in the early stages. In this work, we examine the behavior of a wide array of interpretable features extracted from the handwriting signals of 113 subjects performing multiple tasks on a digital tablet, as part of the Neurological Signals dataset. The aim is to measure their effectiveness in characterizing NDs, including AD and PD. To this end, task-agnostic and task-specific features are extracted from 14 distinct tasks. Subsequently, through statistical analysis and a series of classification experiments, we investigate which features provide greater discriminative power between NDs and healthy controls and amongst different NDs. Preliminary results indicate that the tasks at hand can all be effectively leveraged to distinguish between the considered set of NDs, specifically by measuring the stability, the speed of writing, the time spent not writing, and the pressure variations between groups from our handcrafted interpretable features, which shows a statistically significant difference between groups, across multiple tasks. Using various binary classification algorithms on the computed features, we obtain up to 87% accuracy for the discrimination between AD and healthy controls (CTL), and up to 69% for the discrimination between PD and CTL.
