Table of Contents
Fetching ...

SignalMC-MED: A Multimodal Benchmark for Evaluating Biosignal Foundation Models on Single-Lead ECG and PPG

Fredrik K. Gustafsson, Xiao Gu, Mattia Carletti, Patitapaban Palo, David W. Eyre, David A. Clifton

TL;DR

SignalMC-MED is introduced, a benchmark for evaluating biosignal FMs on synchronized single-lead electrocardiogram (ECG) and photoplethysmogram (PPG) data, and domain-specific biosignal FMs consistently outperform general time-series models and multimodal ECG + PPG fusion yields robust improvements over unimodal inputs.

Abstract

Recent biosignal foundation models (FMs) have demonstrated promising performance across diverse clinical prediction tasks, yet systematic evaluation on long-duration multimodal data remains limited. We introduce SignalMC-MED, a benchmark for evaluating biosignal FMs on synchronized single-lead electrocardiogram (ECG) and photoplethysmogram (PPG) data. Derived from the MC-MED dataset, SignalMC-MED comprises 22,256 visits with 10-minute overlapping ECG and PPG signals, and includes 20 clinically relevant tasks spanning prediction of demographics, emergency department disposition, laboratory value regression, and detection of prior ICD-10 diagnoses. Using this benchmark, we perform a systematic evaluation of representative time-series and biosignal FMs across ECG-only, PPG-only, and ECG + PPG settings. We find that domain-specific biosignal FMs consistently outperform general time-series models, and that multimodal ECG + PPG fusion yields robust improvements over unimodal inputs. Moreover, using the full 10-minute signal consistently outperforms shorter segments, and larger model variants do not reliably outperform smaller ones. Hand-crafted ECG domain features provide a strong baseline and offer complementary value when combined with learned FM representations. Together, these results establish SignalMC-MED as a standardized benchmark and provide practical guidance for evaluating and deploying biosignal FMs.

SignalMC-MED: A Multimodal Benchmark for Evaluating Biosignal Foundation Models on Single-Lead ECG and PPG

TL;DR

SignalMC-MED is introduced, a benchmark for evaluating biosignal FMs on synchronized single-lead electrocardiogram (ECG) and photoplethysmogram (PPG) data, and domain-specific biosignal FMs consistently outperform general time-series models and multimodal ECG + PPG fusion yields robust improvements over unimodal inputs.

Abstract

Recent biosignal foundation models (FMs) have demonstrated promising performance across diverse clinical prediction tasks, yet systematic evaluation on long-duration multimodal data remains limited. We introduce SignalMC-MED, a benchmark for evaluating biosignal FMs on synchronized single-lead electrocardiogram (ECG) and photoplethysmogram (PPG) data. Derived from the MC-MED dataset, SignalMC-MED comprises 22,256 visits with 10-minute overlapping ECG and PPG signals, and includes 20 clinically relevant tasks spanning prediction of demographics, emergency department disposition, laboratory value regression, and detection of prior ICD-10 diagnoses. Using this benchmark, we perform a systematic evaluation of representative time-series and biosignal FMs across ECG-only, PPG-only, and ECG + PPG settings. We find that domain-specific biosignal FMs consistently outperform general time-series models, and that multimodal ECG + PPG fusion yields robust improvements over unimodal inputs. Moreover, using the full 10-minute signal consistently outperforms shorter segments, and larger model variants do not reliably outperform smaller ones. Hand-crafted ECG domain features provide a strong baseline and offer complementary value when combined with learned FM representations. Together, these results establish SignalMC-MED as a standardized benchmark and provide practical guidance for evaluating and deploying biosignal FMs.
Paper Structure (4 sections, 3 figures, 4 tables)

This paper contains 4 sections, 3 figures, 4 tables.

Table of Contents

  1. Section
  2. Results

Figures (3)

  • Figure 1: (a) Overview of the SignalMC-MED benchmark and evaluation framework. For each of the 22,256 visits, synchronized 10-minute single-lead ECG (black) and PPG (red) signals are divided into non-overlapping 10-second segments. A frozen FM (blue) extracts a feature vector for each 10-second segment. The resulting segment-level features are aggregated via mean pooling ($\Sigma$) to form a visit-level representation, which is used to train a linear prediction model (green) across 20 downstream tasks: age regression, sex classification, emergency department disposition classification, laboratory value regression (8 tasks), and prior ICD-10 diagnosis classification (9 tasks). (b) Examples of synchronized ECG and PPG signals from three different visits. For each visit, five 3-second segments are shown, evenly sampled from the full 10-minute signal (segment start times indicated above each panel). The signals illustrate inter-patient variability in rhythm, morphology, amplitude and noise characteristics. Additional examples are provided in Figure \ref{['fig:synced_ecg_ppg_signal_examples_more']} in the supplementary material.
  • Figure 3: Main model ranking on the test set, computed separately for (a) ECG-only, (b) PPG-only, and (c) ECG + PPG inputs. Mean model rank ($\downarrow$) across the five aggregated task categories based on Table \ref{['table:main_results_ecg-baseline-features-ppg-baseline-features-20sec_test']}. A joint ranking across modalities is in Table \ref{['table:main_results_ecg-baseline-features-ppg-baseline-features-20sec_test_rank']}.
  • Figure :