Table of Contents
Fetching ...

Dual-Phase Cross-Modal Contrastive Learning for CMR-Guided ECG Representations for Cardiovascular Disease Assessment

Laura Alvarez-Florez, Angel Bujalance-Gomez, Femke Raijmakers, Samuel Ruiperez-Campillo, Maarten Z. H. Kolk, Jesse Wiers, Julia Vogt, Erik J. Bekkers, Ivana Išgum, Fleur V. Y. Tjong

TL;DR

A contrastive learning framework is introduced that improves the extraction of clinically relevant cardiac phenotypes from ECG by learning from paired ECG-CMR data, and could enable scalable and cost-effective extraction of image-derived traits from ECG.

Abstract

Cardiac magnetic resonance imaging (CMR) offers detailed evaluation of cardiac structure and function, but its limited accessibility restricts use to selected patient populations. In contrast, the electrocardiogram (ECG) is ubiquitous and inexpensive, and provides rich information on cardiac electrical activity and rhythm, yet offers limited insight into underlying cardiac structure and mechanical function. To address this, we introduce a contrastive learning framework that improves the extraction of clinically relevant cardiac phenotypes from ECG by learning from paired ECG-CMR data. Our approach aligns ECG representations with 3D CMR volumes at end-diastole (ED) and end-systole (ES), with a dual-phase contrastive loss to anchor each ECG jointly with both cardiac phases in a shared latent space. Unlike prior methods limited to 2D CMR representations with or without a temporal component, our framework models 3D anatomy at both ED and ES phases as distinct latent representations, enabling flexible disentanglement of structural and functional cardiac properties. Using over 34,000 ECG-CMR pairs from the UK Biobank, we demonstrate improved extraction of image-derived phenotypes from ECG, particularly for functional parameters ($\uparrow$ 9.2\%), while improvements in clinical outcome prediction remained modest ($\uparrow$ 0.7\%). This strategy could enable scalable and cost-effective extraction of image-derived traits from ECG. The code for this research is publicly available.

Dual-Phase Cross-Modal Contrastive Learning for CMR-Guided ECG Representations for Cardiovascular Disease Assessment

TL;DR

A contrastive learning framework is introduced that improves the extraction of clinically relevant cardiac phenotypes from ECG by learning from paired ECG-CMR data, and could enable scalable and cost-effective extraction of image-derived traits from ECG.

Abstract

Cardiac magnetic resonance imaging (CMR) offers detailed evaluation of cardiac structure and function, but its limited accessibility restricts use to selected patient populations. In contrast, the electrocardiogram (ECG) is ubiquitous and inexpensive, and provides rich information on cardiac electrical activity and rhythm, yet offers limited insight into underlying cardiac structure and mechanical function. To address this, we introduce a contrastive learning framework that improves the extraction of clinically relevant cardiac phenotypes from ECG by learning from paired ECG-CMR data. Our approach aligns ECG representations with 3D CMR volumes at end-diastole (ED) and end-systole (ES), with a dual-phase contrastive loss to anchor each ECG jointly with both cardiac phases in a shared latent space. Unlike prior methods limited to 2D CMR representations with or without a temporal component, our framework models 3D anatomy at both ED and ES phases as distinct latent representations, enabling flexible disentanglement of structural and functional cardiac properties. Using over 34,000 ECG-CMR pairs from the UK Biobank, we demonstrate improved extraction of image-derived phenotypes from ECG, particularly for functional parameters ( 9.2\%), while improvements in clinical outcome prediction remained modest ( 0.7\%). This strategy could enable scalable and cost-effective extraction of image-derived traits from ECG. The code for this research is publicly available.
Paper Structure (7 sections, 2 equations, 2 figures, 3 tables)

This paper contains 7 sections, 2 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of the method. (A) ECG and CMR training: A ViT is trained on ECGs using masked autoencoding, while separate CNN encoders for ED and ES CMR volumes are trained to predict imaging phenotypes. (B) Latent space alignment: ECG and both CMR ES and ED embeddings are aligned simultaneously via a contrastive loss, enabling the ECG encoder to capture structural and temporal cardiac features. (C) Downstream prediction: After contrastive learning, CMR-guided ECG representations are input for an MLP trained to predict CMR phenotypes and clinical endpoints.
  • Figure 2: Reconstructed ECG signals for one subject are shown alongside the original 12 lead recordings. The reconstructions display smoother temporal patterns and reduced noise while preserving key waveform characteristics.