Table of Contents
Fetching ...

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

David Bertram, Anja Ophey, Sinah Röttgen, Konstantin Kufer, Gereon R. Fink, Elke Kalbe, Clint Hansen, Walter Maetzler, Maximilian Kapsecker, Lara M. Reimer, Stephan Jonas, Andreas T. Damgaard, Natasha B. Bertelsen, Casper Skjaerbaek, Per Borghammer, Karolien Groenewald, Pietro-Luca Ratti, Michele T. Hu, Noémie Moreau, Michael Sommerauer, Katarzyna Bozek

TL;DR

ActiTect addresses the need for scalable, cross-device screening of REM sleep behavior disorder by delivering an open-source, device-agnostic actigraphy pipeline. It combines robust preprocessing (resampling, auto-calibration, bandpass filtering, non-wear detection) with automated sleep–wake segmentation and interpretable motion features, then uses XGBoost to produce nightly RBD scores aggregated to a patient-level prediction. Across four cohorts and a leave-one-dataset-out validation, the approach demonstrates strong generalization (AUROC up to $0.95$ in training, $0.86$–$0.94$ in external tests) and stable feature importance, supporting a robust, multi-center pre-trained resource for wider deployment. The work highlights the potential for real-world, large-scale RBD screening using wearable actigraphy and provides a transparent, extensible framework for validation and future improvements.

Abstract

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $α$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

TL;DR

ActiTect addresses the need for scalable, cross-device screening of REM sleep behavior disorder by delivering an open-source, device-agnostic actigraphy pipeline. It combines robust preprocessing (resampling, auto-calibration, bandpass filtering, non-wear detection) with automated sleep–wake segmentation and interpretable motion features, then uses XGBoost to produce nightly RBD scores aggregated to a patient-level prediction. Across four cohorts and a leave-one-dataset-out validation, the approach demonstrates strong generalization (AUROC up to in training, in external tests) and stable feature importance, supporting a robust, multi-center pre-trained resource for wider deployment. The work highlights the potential for real-world, large-scale RBD screening using wearable actigraphy and provides a transparent, extensible framework for validation and future improvements.

Abstract

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of -synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

Paper Structure

This paper contains 19 sections, 2 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: ActiTect pipeline overview.(a) Preprocessing. Raw actigraphy data from different devices is standardized through a dedicated preprocessing module, which mitigates systematic differences in signal distribution and enables generalizable motion feature extraction for downstream tasks. The pipeline further performs automated detection of sleep periods and non-wear episodes, reducing the need of manual annotations and enabling consistent analysis across large-scale datasets. (b) Feature Extraction. From detected sleep bouts, we extract meaningful motion features that characterize nocturnal activity patterns relevant to RBD. Local features are computed for each activity bout, then aggregated to derive global descriptors representing the entire night. (c) Predictive Model. Each night’s extracted global motion features are mapped to an RBD probability score using boosted decision trees (XGBoost). These nightly scores are then aggregated into a patient-level risk score via a custom function that combines mean-probability thresholding and majority voting. The final binary RBD prediction is obtained by thresholding each patient’s aggregated risk score.
  • Figure 2: Robust preprocessing for generalizable RBD detection. Overview of preprocessing steps and validation; cohort colors match the legend (bottom center). (a) Resampling. Cumulative clock drift over recording time. Raw actigraphy signals sampled at a nominal $100\,\textrm{Hz}$ show substantial timing drift due to internal clock inaccuracies. Resampling corrects this drift to within numerical precision, as evidenced by near-identical post-resampling curves across cohorts. (b) Calibration. Initial calibration error $\epsilon_0$ vs reduction efficiency $1 - \epsilon/\epsilon_0$, where $\epsilon$ is the post-calibration error. Calibration is highly effective across cohorts, with $\operatorname{mean}\pm\operatorname{SD}\,[\operatorname{95\%CI}]$ efficiencies of $0.93\pm0.04\,[0.92,0.94]$ (CogTrAiL-RBD), $0.87\pm0.04\,[0.85,0.89]$ (Local Test), and $0.91\pm0.05\,[0.91,0.92]$ (OPDC). Higher initial errors yield greater correction gains. (c) Filtering. Amplitude spectral density (ASD) before (dotted line) and after (solid line) bandpass filtering, highlighting suppression of noise outside while preserving signal power within the $0.8\textrm{\textendash}20\,\textrm{Hz}$ passband. Cohort-averaged ASDs (Welch’s method) align closely outside the band but show greater variability (SD shown by shaded area) within it, supporting the choice of frequency cutoffs that isolate signal-dominated activity. Retention and suppression scores were $0.78\pm0.01\,[0.77, 0.78]\,/\,0.89\pm0.01\,[0.89, 0.90]$ for CogTrAiL-RBD data, and $0.73\pm0.09 [0.70, 0.77]\,/\,0.90\pm0.01\,[0.89, 0.90]$ for Local Test data. (d) Sleep-Detection. Comparison of automatically detected sleep onset and wake-up times with reference values from sleep diaries (CogTrAiL-RBD, $n=756$) and PSG (Local Test, $n=32$). Subfigure (i) displays predicted and reference times in clock format; the closer the connecting lines are to perfectly radial, the stronger the temporal alignment. Subfigure (ii) shows a scatter plot of automated versus reference times. Strong agreement is evidenced by Pearson correlation coefficients of $0.994\pm0.001\,[0.994,0.995]$ (CogTrAiL-RBD), $0.996\pm0.001\,[0.992,0.998]$ (Local Test) and mean-absolute errors (in minutes) of $34.4\pm40.9\,[31.5,37.3]$ minutes (CogTrAiL-RBD), $35.8\pm45.5\,[19.4,52.3]$ (Local Test). The relatively large SDs compared to the means reflect some high-variance nights, while the narrow confidence intervals suggest that the mean error estimates remain robust at the group level.
  • Figure 3: Predictive RBD Modeling Results.(a) Violin plots of two selected features illustrating distributional shifts between individuals with RBD and healthy controls. P-values are computed using two-sided Mann–Whitney U tests, and effect sizes $\left(\delta\right)$ are reported as Cliff’s delta. These features are discussed in more detail at the end of the results section. (b) ROC curves of the nested cross-validation results of the night-level prediction (left) and after aggregation to the patient level (right). The blue line indicates the mean over all folds, and the shaded area represents the 95% confidence interval. The improved performance after aggregation reflects the benefit of multi-night actigraphy and helps mitigate night-to-night variability in motor activity. (c) Calibration curve on the night level using predictions from nested cross-validation. Triangles indicate the observed positive rate per probability bin; the shaded region shows the 95% CI across folds. The predicted probabilities are well calibrated and closely reflect the true likelihood of RBD. (d) Radar plot summarizing classifier performance across multiple evaluation metrics for the external test sets. Results are shown separately for the Local Test cohort (cyan), the OPDC cohort (magenta) and the PACE cohort (dark-orange), where (iRBD) and (all-RBD) denote the respective classification tasks (see \ref{['table:validation_results']}), indicating robust and balanced generalization with a subtle emphasis on recall.
  • Figure 4: Unified Multi-Center Model: Cross-Cohort Performance and Model Stability.(a) LODO Performance. ROC curves of the leave-one-dataset-out (LODO) cross-validation. Each curve corresponds to one fold, with one dataset held out for testing while the others were used for training. The results show consistently high discrimination across datasets, indicating robust generalization. b) Feature Ranking Stability. Spearman’s rank correlation of ActiTect’s inherent feature rankings across LODO folds, indicating consistently strong agreement (moderate-to-strong for OPDC holdout) and supporting the robustness of the final model. (c) Feature Selection Stability & Ablation Model performance as a function of features retained from a consensus ranking. Performance peaks around $\sim20$ features (the stable “core”), while the mean selected count across 20 seeded runs is slightly higher ${abs(29.738)}pt \fpeval{29.738} ^{ \,+ {abs(\fpeval{31.159 - 29.738})}pt \fpeval{\fpeval{31.159 - 29.738}} }_{ \,- {abs(\fpeval{29.738 - 28.316})}pt \fpeval{\fpeval{29.738 - 28.316}} }$ with a narrow band, indicating stable selection. Beyond this range, performance saturates and remains stable. (d) Hyperparameter Stability. Stability scores from repeated LODO runs (n=20) show that nearly all hyperparameters are highly stable, with only minor variability in a subset of hyperparameters. Overall, the training procedure converges to consistent configurations across cohorts, underscoring the robustness of the pipeline.
  • Figure 5: Spearman’s rank correlations of feature importance rankings derived within individual datasets. Correlations ranged from 0.37 to 0.70, indicating moderate-to-strong agreement overall while still reflecting cohort-specific feature preferences. This variability underscores the value of pooling data to optimize the generalizability of feature sets.