Using AI to Measure Parkinson's Disease Severity at Home

Md Saiful Islam; Wasifur Rahman; Abdelrahman Abdelkader; Phillip T. Yang; Sangwu Lee; Jamie L. Adams; Ruth B. Schneider; E. Ray Dorsey; Ehsan Hoque

Using AI to Measure Parkinson's Disease Severity at Home

Md Saiful Islam, Wasifur Rahman, Abdelrahman Abdelkader, Phillip T. Yang, Sangwu Lee, Jamie L. Adams, Ruth B. Schneider, E. Ray Dorsey, Ehsan Hoque

TL;DR

This work tackles remote, home-based assessment of Parkinson's disease severity by enabling individuals to perform a finger-tapping task in front of a webcam and have their motor impairment scored automatically. It combines MediaPipe hand tracking with a curated set of interpretable digital biomarkers, selecting 22 informative features out of 53 and training a LightGBM regressor under leave-one-patient-out cross-validation; the model achieves a mean absolute error of $0.58$ and a Pearson correlation of $0.66$ with ground-truth scores, approaching but not fully matching expert ratings. The study demonstrates strong inter-expert reliability (ICC $=0.88$, Krippendorff's $ ext{alpha}=0.69$) and shows that SHAP explanations align with clinically meaningful signals, suggesting potential for wide accessibility in resource-limited settings. Limitations include a relatively small, imbalanced dataset with few severe cases and tremor confounds; nonetheless, the approach offers a scalable, interpretable pathway to extend digital biomarkers and remote monitoring to other movement disorders and tasks.

Abstract

We present an artificial intelligence system to remotely assess the motor performance of individuals with Parkinson's disease (PD). Participants performed a motor task (i.e., tapping fingers) in front of a webcam, and data from 250 global participants were rated by three expert neurologists following the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS). The neurologists' ratings were highly reliable, with an intra-class correlation coefficient (ICC) of 0.88. We developed computer algorithms to obtain objective measurements that align with the MDS-UPDRS guideline and are strongly correlated with the neurologists' ratings. Our machine learning model trained on these measures outperformed an MDS-UPDRS certified rater, with a mean absolute error (MAE) of 0.59 compared to the rater's MAE of 0.79. However, the model performed slightly worse than the expert neurologists (0.53 MAE). The methodology can be replicated for similar motor tasks, providing the possibility of evaluating individuals with PD and other movement disorders remotely, objectively, and in areas with limited access to neurological care.

Using AI to Measure Parkinson's Disease Severity at Home

TL;DR

and a Pearson correlation of

with ground-truth scores, approaching but not fully matching expert ratings. The study demonstrates strong inter-expert reliability (ICC

, Krippendorff's

) and shows that SHAP explanations align with clinically meaningful signals, suggesting potential for wide accessibility in resource-limited settings. Limitations include a relatively small, imbalanced dataset with few severe cases and tremor confounds; nonetheless, the approach offers a scalable, interpretable pathway to extend digital biomarkers and remote monitoring to other movement disorders and tasks.

Abstract

Paper Structure (13 sections, 5 figures, 3 tables)

This paper contains 13 sections, 5 figures, 3 tables.

Introduction
Results
Discussion
Methods
Hand separation:
Extracting hand key points:
Noise reduction:
Computational features:
Highly-correlated feature removal and significance test:
Model training:
Data imbalance:
Evaluation:
Model interpretation:

Figures (5)

Figure 1: Overview of the AI-based system for assessing the severity of motor performance. Anyone can perform the finger-tapping task in front of a computer webcam. The system employs a hand-tracking model to locate the key points of the hand, enabling a continuous tracking of the finger-tapping angle incident by the thumb finger-tip, the wrist, and the index finger-tip. After reducing noise from the time-series data of this angle, the system computes several objective features associated with motor function severity. The AI-based model then utilizes these features to assess the severity score automatically.
Figure 2: Data collection. The participants, both those with Parkinson's disease (PD) and healthy controls, performed the task primarily in a noisy home environment without any clinical supervision. The dataset includes blurry videos caused by poor internet connection, videos where participants had difficulty following instructions, and videos with overexposed or underexposed backgrounds. These issues are common when collecting data from home, particularly from an aged population that may be less familiar with technology than other age groups.
Figure 3: An overview of how the experts and the non-experts agreed on their ratings. Green dots indicate two raters having a perfect agreement, while grey, orange, and red dots imply a difference of 1, 2, and 3 points, respectively. We did not observe any 4 points rating difference. The high density of green and gray dots and an ICC score of 0.88 verifies that the experts demonstrated high inter-rater agreement among themselves, and the finger-tapping task can be reliably rated when recorded from home. However, the non-experts were less reliable than the experts, demonstrating moderate agreement with the three expert raters (the average ICC of a non-expert’s ratings and the ratings from the three experts were 0.72, 0.74, and 0.70, respectively.)
Figure 4: Model performance. (a) We observe good agreement between the predicted severity and the ground truth scores. Green dots indicate correct predictions, while grey, orange, and red dots imply a difference of 1, 2, and 3 points between the predicted and actual scores. We did not observe any 4 points rating difference. (b) The confusion matrix presents the agreement numerically. (c) The mean absolute error (MAE) measures the difference between two ratings. The model incurs slightly higher MAE than an average expert but substantially lower MAE than the non-experts. (d) Pearson correlation coefficient (PCC) measures the correlation between two sets of ratings. The model's predicted severity ratings are more correlated with the ground truth scores than the non-experts' (higher PCC) but less correlated than an average expert's (lower PCC) ratings.
Figure 5: Data pre-processing. Finger-tapping angles incident by three hand key points (thumb-tip, wrist, index finger-tip) plotted as a time series. Figures on the left show the noisy raw signals directly extracted using MediaPipe. After the noise reduction step, we identified peak angles (red dots) using a custom peak detection algorithm. Finally, trimming the signal by removing the first and last tap yields the cleanest signal used for analysis, as shown on the right. The top figures depict a person with severe tapping difficulty (severity: 3), resulting in low and irregular amplitudes. The central figures show a person with moderate tapping ability (severity: 2), with slow and interrupted tapping and irregular amplitudes. Finally, the bottom figures show a person with good rhythmic tapping ability, albeit with a slower tapping speed (severity: 1).

Using AI to Measure Parkinson's Disease Severity at Home

TL;DR

Abstract

Using AI to Measure Parkinson's Disease Severity at Home

Authors

TL;DR

Abstract

Table of Contents

Figures (5)