Table of Contents
Fetching ...

Motion-Robust Multimodal Fusion of PPG and Accelerometer Signals for Three-Class Heart Rhythm Classification

Yangyang Zhao, Matti Kaisti, Olli Lahdenoja, Tero Koivisto

TL;DR

The paper tackles robust three-class rhythm classification from wrist-worn PPG under motion by extending AF detection to AF, SR, and Other. It introduces RhythmiNet, a multimodal architecture that fuses PPG and accelerometer data via a residual backbone with SE blocks and a temporal attention module, using four input channels on 30-second segments. Evaluated on data from 49 elderly inpatients with roughly 1000 hours of training data and test data stratified by motion, RhythmiNet achieves a macro-AUC improvement of 4.3 percentage points over a PPG-only baseline and outperforms HRV-based logistic regression by ~12%. Segment-level motion scores were computed as the variance of the motion magnitude $sqrt(acc_x(t)^2 + acc_y(t)^2 + acc_z(t)^2)$, enabling evaluation across motion intensities without excluding data. Overall, the approach demonstrates that simple multimodal fusion with attention yields robust rhythm classification in real-world wearable data, with clear implications for continual cardiac monitoring.

Abstract

Atrial fibrillation (AF) is a leading cause of stroke and mortality, particularly in elderly patients. Wrist-worn photoplethysmography (PPG) enables non-invasive, continuous rhythm monitoring, yet suffers from significant vulnerability to motion artifacts and physiological noise. Many existing approaches rely solely on single-channel PPG and are limited to binary AF detection, often failing to capture the broader range of arrhythmias encountered in clinical settings. We introduce RhythmiNet, a residual neural network enhanced with temporal and channel attention modules that jointly leverage PPG and accelerometer (ACC) signals. The model performs three-class rhythm classification: AF, sinus rhythm (SR), and Other. To assess robustness across varying movement conditions, test data are stratified by accelerometer-based motion intensity percentiles without excluding any segments. RhythmiNet achieved a 4.3% improvement in macro-AUC over the PPG-only baseline. In addition, performance surpassed a logistic regression model based on handcrafted HRV features by 12%, highlighting the benefit of multimodal fusion and attention-based learning in noisy, real-world clinical data.

Motion-Robust Multimodal Fusion of PPG and Accelerometer Signals for Three-Class Heart Rhythm Classification

TL;DR

The paper tackles robust three-class rhythm classification from wrist-worn PPG under motion by extending AF detection to AF, SR, and Other. It introduces RhythmiNet, a multimodal architecture that fuses PPG and accelerometer data via a residual backbone with SE blocks and a temporal attention module, using four input channels on 30-second segments. Evaluated on data from 49 elderly inpatients with roughly 1000 hours of training data and test data stratified by motion, RhythmiNet achieves a macro-AUC improvement of 4.3 percentage points over a PPG-only baseline and outperforms HRV-based logistic regression by ~12%. Segment-level motion scores were computed as the variance of the motion magnitude , enabling evaluation across motion intensities without excluding data. Overall, the approach demonstrates that simple multimodal fusion with attention yields robust rhythm classification in real-world wearable data, with clear implications for continual cardiac monitoring.

Abstract

Atrial fibrillation (AF) is a leading cause of stroke and mortality, particularly in elderly patients. Wrist-worn photoplethysmography (PPG) enables non-invasive, continuous rhythm monitoring, yet suffers from significant vulnerability to motion artifacts and physiological noise. Many existing approaches rely solely on single-channel PPG and are limited to binary AF detection, often failing to capture the broader range of arrhythmias encountered in clinical settings. We introduce RhythmiNet, a residual neural network enhanced with temporal and channel attention modules that jointly leverage PPG and accelerometer (ACC) signals. The model performs three-class rhythm classification: AF, sinus rhythm (SR), and Other. To assess robustness across varying movement conditions, test data are stratified by accelerometer-based motion intensity percentiles without excluding any segments. RhythmiNet achieved a 4.3% improvement in macro-AUC over the PPG-only baseline. In addition, performance surpassed a logistic regression model based on handcrafted HRV features by 12%, highlighting the benefit of multimodal fusion and attention-based learning in noisy, real-world clinical data.

Paper Structure

This paper contains 9 sections, 3 figures.

Figures (3)

  • Figure 1: Overview of the data collection and motion stratification process. A wrist-worn Philips Datalogger recorded both PPG and tri-axial ACC signals, while a chest-worn Bittium Faros device recorded reference ECG. ECG provided rhythm labels (AF, SR, Other) used for training and testing. Segment-level motion scores were computed from the ACC magnitude and used to stratify test data into percentile-based motion levels. Example PPG and ACC traces are shown for two motion conditions: low-motion (top) and high-motion (bottom).
  • Figure 2: Overview of the RhythmiNet architecture for three-class heart rhythm classification (SR, AF, Other). The model processes 30-second segments of synchronized PPG and tri-axial ACC signals (960 samples at 32 Hz). Inputs are concatenated and passed through a convolutional stem, followed by two residual blocks enhanced with Squeeze-and-Excitation (SE) modules to emphasize informative channels. A temporal attention module captures long-range dependencies across time. Finally, global average pooling and a fully connected layer form the classification head. Dashed boxes illustrate the internal structure of selected modules (e.g., residual blocks and attention).
  • Figure 3: Performance across motion intensity percentiles: (Left) Macro-AUC, (Middle) Micro-AUC, and (Right) Accuracy.