Table of Contents
Fetching ...

Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture

Sehun Kim

TL;DR

ECG-JEPA addresses the scarcity of labeled ECG data by learning semantic representations through latent-space patch prediction in a joint-embedding predictive framework. The method adds a domain-informed Cross-Pattern Attention and a teacher-student-predictor setup to learn robust features from unlabeled 12-lead ECG data, pretrained on ~180k recordings. It achieves state-of-the-art performance across linear evaluation, fine-tuning, low-shot learning, and reduced-lead scenarios on PTB-XL and CPSC2018, and can recover classical ECG features like heart rate and QRS duration from the learned representations. The approach is scalable, efficient (training ~100 epochs in ~22 hours on a single RTX 3090), and demonstrates that 8-lead pretraining can suffice, broadening applicability to wearable and resource-limited settings.

Abstract

Electrocardiogram (ECG) captures the heart's electrical signals, offering valuable information for diagnosing cardiac conditions. However, the scarcity of labeled data makes it challenging to fully leverage supervised learning in medical domain. Self-supervised learning (SSL) offers a promising solution, enabling models to learn from unlabeled data and uncover meaningful patterns. In this paper, we show that masked modeling in the latent space can be a powerful alternative to existing self-supervised methods in the ECG domain. We introduce ECG-JEPA, a SSL model for 12-lead ECG analysis that learns semantic representations of ECG data by predicting in the hidden latent space, bypassing the need to reconstruct raw signals. This approach offers several advantages in the ECG domain: (1) it avoids producing unnecessary details, such as noise, which is common in ECG; and (2) it addresses the limitations of naïve L2 loss between raw signals. Another key contribution is the introduction of Cross-Pattern Attention (CroPA), a specialized masked attention mechanism tailored for 12-lead ECG data. ECG-JEPA is trained on the union of several open ECG datasets, totaling approximately 180,000 samples, and achieves state-of-the-art performance in various downstream tasks including ECG classification and feature prediction. Our code is openly available at https://github.com/sehunfromdaegu/ECG_JEPA.

Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture

TL;DR

ECG-JEPA addresses the scarcity of labeled ECG data by learning semantic representations through latent-space patch prediction in a joint-embedding predictive framework. The method adds a domain-informed Cross-Pattern Attention and a teacher-student-predictor setup to learn robust features from unlabeled 12-lead ECG data, pretrained on ~180k recordings. It achieves state-of-the-art performance across linear evaluation, fine-tuning, low-shot learning, and reduced-lead scenarios on PTB-XL and CPSC2018, and can recover classical ECG features like heart rate and QRS duration from the learned representations. The approach is scalable, efficient (training ~100 epochs in ~22 hours on a single RTX 3090), and demonstrates that 8-lead pretraining can suffice, broadening applicability to wearable and resource-limited settings.

Abstract

Electrocardiogram (ECG) captures the heart's electrical signals, offering valuable information for diagnosing cardiac conditions. However, the scarcity of labeled data makes it challenging to fully leverage supervised learning in medical domain. Self-supervised learning (SSL) offers a promising solution, enabling models to learn from unlabeled data and uncover meaningful patterns. In this paper, we show that masked modeling in the latent space can be a powerful alternative to existing self-supervised methods in the ECG domain. We introduce ECG-JEPA, a SSL model for 12-lead ECG analysis that learns semantic representations of ECG data by predicting in the hidden latent space, bypassing the need to reconstruct raw signals. This approach offers several advantages in the ECG domain: (1) it avoids producing unnecessary details, such as noise, which is common in ECG; and (2) it addresses the limitations of naïve L2 loss between raw signals. Another key contribution is the introduction of Cross-Pattern Attention (CroPA), a specialized masked attention mechanism tailored for 12-lead ECG data. ECG-JEPA is trained on the union of several open ECG datasets, totaling approximately 180,000 samples, and achieves state-of-the-art performance in various downstream tasks including ECG classification and feature prediction. Our code is openly available at https://github.com/sehunfromdaegu/ECG_JEPA.

Paper Structure

This paper contains 36 sections, 6 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: 12-lead ECG with baseline wander artifact.
  • Figure 2: Key ECG Features.
  • Figure 3: ECG-JEPA training overview. For illustration, we use $L = 3$, $N = 5$ subintervals and $Q = 3$ unmasked subintervals.
  • Figure 4: Cross-Pattern Attention (CroPA). The patch in the middle attends only to the colored patches.
  • Figure 5: Squares following the encoder represent the representations of ECG patches. The representations are subsequently averaged through a pooling layer, with the resulting vector (highlighted in cyan) serving as an abstract representation of the ECG data.