Table of Contents
Fetching ...

TolerantECG: A Foundation Model for Imperfect Electrocardiogram

Huynh Dang Nguyen, Trong-Thang Pham, Ngan Le, Van Nguyen

TL;DR

TolerantECG tackles the challenge of diagnosing cardiac conditions from imperfect ECG data by learning robust multimodal representations that align ECG signals with detailed text reports. The framework combines Cardiac Feature Retrieval (CFR) to generate informative diagnostic descriptions and a dual-mode distillation scheme (DuoDistill) to handle lead-missing and noisy signals, with alternating training to reinforce robustness. Empirical results on PTB-XL and MIT-BIH show state-of-the-art or near state-of-the-art performance across varying conditions, highlighting strong transferability and resilience to common ECG artifacts. This work advances practical ECG analysis by enabling reliable interpretation with incomplete or degraded signals, reducing diagnostic uncertainty in real-world settings.

Abstract

The electrocardiogram (ECG) is an essential and effective tool for diagnosing heart diseases. However, its effectiveness can be compromised by noise or unavailability of one or more leads of the standard 12-lead recordings, resulting in diagnostic errors or uncertainty. To address these challenges, we propose TolerantECG, a foundation model for ECG signals that is robust to noise and capable of functioning with arbitrary subsets of the standard 12-lead ECG. TolerantECG training combines contrastive and self-supervised learning frameworks to jointly learn ECG signal representations alongside their corresponding knowledge-retrieval-based text report descriptions and corrupted or lead-missing signals. Comprehensive benchmarking results demonstrate that TolerantECG consistently ranks as the best or second-best performer across various ECG signal conditions and class levels in the PTB-XL dataset, and achieves the highest performance on the MIT-BIH Arrhythmia Database.

TolerantECG: A Foundation Model for Imperfect Electrocardiogram

TL;DR

TolerantECG tackles the challenge of diagnosing cardiac conditions from imperfect ECG data by learning robust multimodal representations that align ECG signals with detailed text reports. The framework combines Cardiac Feature Retrieval (CFR) to generate informative diagnostic descriptions and a dual-mode distillation scheme (DuoDistill) to handle lead-missing and noisy signals, with alternating training to reinforce robustness. Empirical results on PTB-XL and MIT-BIH show state-of-the-art or near state-of-the-art performance across varying conditions, highlighting strong transferability and resilience to common ECG artifacts. This work advances practical ECG analysis by enabling reliable interpretation with incomplete or degraded signals, reducing diagnostic uncertainty in real-world settings.

Abstract

The electrocardiogram (ECG) is an essential and effective tool for diagnosing heart diseases. However, its effectiveness can be compromised by noise or unavailability of one or more leads of the standard 12-lead recordings, resulting in diagnostic errors or uncertainty. To address these challenges, we propose TolerantECG, a foundation model for ECG signals that is robust to noise and capable of functioning with arbitrary subsets of the standard 12-lead ECG. TolerantECG training combines contrastive and self-supervised learning frameworks to jointly learn ECG signal representations alongside their corresponding knowledge-retrieval-based text report descriptions and corrupted or lead-missing signals. Comprehensive benchmarking results demonstrate that TolerantECG consistently ranks as the best or second-best performer across various ECG signal conditions and class levels in the PTB-XL dataset, and achieves the highest performance on the MIT-BIH Arrhythmia Database.

Paper Structure

This paper contains 20 sections, 1 equation, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: The Cardiac Feature Retrieval (CFR) pipeline employs information gathered from Life in the Fastlane website to retrieve relevant waveform criteria for each cardiac disease. Diagnoses are encoded into vector embeddings and stored in a vector database.
  • Figure 2: Example of a 12-lead ECG recording with its associated metadata and diagnoses, and the detailed report description constructed by our method.
  • Figure 3: TolerantECG training framework, which comprises (1) a Report Alignment module that aligns ECG signals with detailed text reports and (2) a self-supervised learning module with lead-missing distillation and noise distillation (DuoDistill). In DuoDistill, the student model (ECG Encoder) and two teacher models ($\overline{\textit{ECG Encoder}}$ and $\overline{\overline{\textit{ECG Encoder}}}$) have the same architecture, and the distillation steps occur in a ping-pong manner, alternating between the two different distillation tasks. The student processes both Minor and Major augmentations, where Minor augmentation retains a large portion of the original signal while Major augmentation preserves a small portion. These augmentations involve lead masking and noise addition. A stop-gradient (sg) operator is applied to the teachers to propagate gradients only through the student. The total loss is the combination of the contrastive learning loss and the self-supervised loss. The ECG Encoder utilizes randomly initialized ConvNeXt V2, while pretrained BioLinkBERT is used as the Text Encoder.
  • Figure 4: AUC (%) evaluated on different number of leads for PTB-XL All classification task
  • Figure 5: AUC (%) performance across different SNR levels for PTB-XL All classification task
  • ...and 1 more figures