Motion-Robust Multimodal Fusion of PPG and Accelerometer Signals for Three-Class Heart Rhythm Classification
Yangyang Zhao, Matti Kaisti, Olli Lahdenoja, Tero Koivisto
TL;DR
The paper tackles robust three-class rhythm classification from wrist-worn PPG under motion by extending AF detection to AF, SR, and Other. It introduces RhythmiNet, a multimodal architecture that fuses PPG and accelerometer data via a residual backbone with SE blocks and a temporal attention module, using four input channels on 30-second segments. Evaluated on data from 49 elderly inpatients with roughly 1000 hours of training data and test data stratified by motion, RhythmiNet achieves a macro-AUC improvement of 4.3 percentage points over a PPG-only baseline and outperforms HRV-based logistic regression by ~12%. Segment-level motion scores were computed as the variance of the motion magnitude $sqrt(acc_x(t)^2 + acc_y(t)^2 + acc_z(t)^2)$, enabling evaluation across motion intensities without excluding data. Overall, the approach demonstrates that simple multimodal fusion with attention yields robust rhythm classification in real-world wearable data, with clear implications for continual cardiac monitoring.
Abstract
Atrial fibrillation (AF) is a leading cause of stroke and mortality, particularly in elderly patients. Wrist-worn photoplethysmography (PPG) enables non-invasive, continuous rhythm monitoring, yet suffers from significant vulnerability to motion artifacts and physiological noise. Many existing approaches rely solely on single-channel PPG and are limited to binary AF detection, often failing to capture the broader range of arrhythmias encountered in clinical settings. We introduce RhythmiNet, a residual neural network enhanced with temporal and channel attention modules that jointly leverage PPG and accelerometer (ACC) signals. The model performs three-class rhythm classification: AF, sinus rhythm (SR), and Other. To assess robustness across varying movement conditions, test data are stratified by accelerometer-based motion intensity percentiles without excluding any segments. RhythmiNet achieved a 4.3% improvement in macro-AUC over the PPG-only baseline. In addition, performance surpassed a logistic regression model based on handcrafted HRV features by 12%, highlighting the benefit of multimodal fusion and attention-based learning in noisy, real-world clinical data.
