Table of Contents
Fetching ...

Learning ECG Image Representations via Dual Physiological-Aware Alignments

Hung Manh Pham, Jialu Tang, Aaqib Saeed, Dong Ma, Bin Zhu, Pan Zhou

Abstract

Electrocardiograms (ECGs) are among the most widely used diagnostic tools for cardiovascular diseases, and a large amount of ECG data worldwide appears only in image form. However, most existing automated ECG analysis methods rely on access to raw signal recordings, limiting their applicability in real-world and resource-constrained settings. In this paper, we present ECG-Scan, a self-supervised framework for learning clinically generalized representations from ECG images through dual physiological-aware alignments: 1) Our approach optimizes image representation learning using multimodal contrastive alignment between image and gold-standard signal-text modalities. 2) We further integrate domain knowledge via soft-lead constraints, regularizing the reconstruction process and improving signal lead inter-consistency. Extensive benchmarking across multiple datasets and downstream tasks demonstrates that our image-based model achieves superior performance compared to existing image baselines and notably narrows the gap between ECG image and signal analysis. These results highlight the potential of self-supervised image modeling to unlock large-scale legacy ECG data and broaden access to automated cardiovascular diagnostics.

Learning ECG Image Representations via Dual Physiological-Aware Alignments

Abstract

Electrocardiograms (ECGs) are among the most widely used diagnostic tools for cardiovascular diseases, and a large amount of ECG data worldwide appears only in image form. However, most existing automated ECG analysis methods rely on access to raw signal recordings, limiting their applicability in real-world and resource-constrained settings. In this paper, we present ECG-Scan, a self-supervised framework for learning clinically generalized representations from ECG images through dual physiological-aware alignments: 1) Our approach optimizes image representation learning using multimodal contrastive alignment between image and gold-standard signal-text modalities. 2) We further integrate domain knowledge via soft-lead constraints, regularizing the reconstruction process and improving signal lead inter-consistency. Extensive benchmarking across multiple datasets and downstream tasks demonstrates that our image-based model achieves superior performance compared to existing image baselines and notably narrows the gap between ECG image and signal analysis. These results highlight the potential of self-supervised image modeling to unlock large-scale legacy ECG data and broaden access to automated cardiovascular diagnostics.

Paper Structure

This paper contains 25 sections, 12 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: (a) Common ECG acquisition in a clinical environment, where an ECG machine is connected to a printer to get a paper-based document for patients. Optionally, a local scanner might be connected for digital archive (images, in PDF), depending on the ECG machine types and hardware systems. (b) Patient capturing of ECG printouts using a smartphone camera (or a scanner), resulting in an image-based ECG for their own long-term archive. (c) Possible remote review of ECG images during consultation service, where expert clinicians interpret in-depth cardiac patterns from shared ECG images.
  • Figure 2: Illustration of ECG-Scan. We present a multimodal framework that aligns ECG images, signals, and clinical texts through dual physiological-aware alignment strategy.
  • Figure 3: Illustration of Einthoven’s Law and Goldberger’s equations for limb leads of ECG signals.
  • Figure 4: T-SNE visualizations of representations learned by different ECG encoders on the CSN testing set. Here, ECG-Scan deals with image input, while the others use 10s ECG signals. Each color represents a cardiac diagnosis category.
  • Figure 5: Examples of image augmentations during pretraining.
  • ...and 1 more figures