CARE-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson's Disease Gait Assessment
Vida Adeli, Ivan Klabucar, Javad Rajabi, Benjamin Filtjens, Soroush Mehraban, Diwei Wang, Hyewon Seo, Trung-Hieu Hoang, Minh N. Do, Candice Muller, Claudia Oliveira, Daniel Boari Coelho, Pieter Ginis, Moran Gilat, Alice Nieuwboer, Joke Spildooren, Lucas Mckay, Hyeokhyen Kwon, Gari Clifford, Christine Esper, Stewart Factor, Imari Genias, Amirhossein Dadashzadeh, Leia Shum, Alan Whone, Majid Mirmehdi, Andrea Iaboni, Babak Taati
TL;DR
CARE-PD tackles the scarcity of large, clinically annotated PD gait datasets by aggregating nine cohorts across eight sites into anonymized SMPL gait meshes, totaling 18.66 hours and 8,477 walks. It introduces two benchmarks—UPDRS-gait prediction and motion pretext tasks (2D-to-3D lifting and 3D reconstruction)—plus four generalization protocols to assess cross-site robustness. Experiments show that pretrained motion encoders retain some clinical signal but struggle with cross-site generalization, while pretraining or fine-tuning on CARE-PD substantially improves 3D reconstruction error ($MPJPE$) and UPDRS-gait macro-F1 (e.g., $MPJPE$ from $60.8$ mm to $7.5$ mm and macro-F1 up by $17$ points). These findings highlight the value of diverse, clinically curated data for robust representation learning and domain adaptation, enabling more objective gait assessment in PD care. The dataset is released under a non-commercial license with privacy protections to accelerate clinical translation of motion AI methods.
Abstract
Objective gait assessment in Parkinson's Disease (PD) is limited by the absence of large, diverse, and clinically annotated motion datasets. We introduce CARE-PD, the largest publicly available archive of 3D mesh gait data for PD, and the first multi-site collection spanning 9 cohorts from 8 clinical centers. All recordings (RGB video or motion capture) are converted into anonymized SMPL meshes via a harmonized preprocessing pipeline. CARE-PD supports two key benchmarks: supervised clinical score prediction (estimating Unified Parkinson's Disease Rating Scale, UPDRS, gait scores) and unsupervised motion pretext tasks (2D-to-3D keypoint lifting and full-body 3D reconstruction). Clinical prediction is evaluated under four generalization protocols: within-dataset, cross-dataset, leave-one-dataset-out, and multi-dataset in-domain adaptation. To assess clinical relevance, we compare state-of-the-art motion encoders with a traditional gait-feature baseline, finding that encoders consistently outperform handcrafted features. Pretraining on CARE-PD reduces MPJPE (from 60.8mm to 7.5mm) and boosts PD severity macro-F1 by 17 percentage points, underscoring the value of clinically curated, diverse training data. CARE-PD and all benchmark code are released for non-commercial research at https://neurips2025.care-pd.ca/.
