Table of Contents
Fetching ...

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Han Yu, Peikun Guo, Akane Sano

TL;DR

A new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals and validated through various downstream tasks, including arrhythmia detection and ECG-based subject identification.

Abstract

The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. By leveraging the capabilities of deep learning in semantic understanding, especially in feature extraction and representation learning, this study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals. Our framework comprises two key components, including Cardio Query Assistant (CQA) and ECG Semantics Integrator(ESI). CQA integrates a retrieval-augmented generation (RAG) pipeline to leverage large language models (LLMs) and external medical knowledge to generate detailed textual descriptions of ECGs. The generated text is enriched with information about demographics and waveform patterns. ESI integrates both contrastive and captioning loss to pretrain ECG encoders for enhanced representations. We validate our approach through various downstream tasks, including arrhythmia detection and ECG-based subject identification. Our experimental results demonstrate substantial improvements over strong baselines in these tasks. These baselines encompass supervised and self-supervised learning methods, as well as prior multimodal pretraining approaches.

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

TL;DR

A new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals and validated through various downstream tasks, including arrhythmia detection and ECG-based subject identification.

Abstract

The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. By leveraging the capabilities of deep learning in semantic understanding, especially in feature extraction and representation learning, this study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals. Our framework comprises two key components, including Cardio Query Assistant (CQA) and ECG Semantics Integrator(ESI). CQA integrates a retrieval-augmented generation (RAG) pipeline to leverage large language models (LLMs) and external medical knowledge to generate detailed textual descriptions of ECGs. The generated text is enriched with information about demographics and waveform patterns. ESI integrates both contrastive and captioning loss to pretrain ECG encoders for enhanced representations. We validate our approach through various downstream tasks, including arrhythmia detection and ECG-based subject identification. Our experimental results demonstrate substantial improvements over strong baselines in these tasks. These baselines encompass supervised and self-supervised learning methods, as well as prior multimodal pretraining approaches.
Paper Structure (42 sections, 3 equations, 7 figures, 8 tables)

This paper contains 42 sections, 3 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: The Cardio Query Assistant (CQA) Framework employs a novel knowledge-based approach to generate detailed and clinically relevant textual descriptions for ECG signals, which translates ECG conditions into enriched ECG waveform patterns.
  • Figure 2: Example of a 12-lead ECG signal and its associated metadata. The left side displays the 12-lead ECG waveform, which provides a visual representation of the heart's electrical activity. The right side includes relevant patient demographic information (age, gender), a summary of ECG metadata including high-level clinical findings (e.g., Sinus rhythm, Abnormal R-wave progression), and a detailed textual description of the ECG conditions generated with the CQA pipeline.
  • Figure 3: The ECG Semantics Integrator (ESI) is built based on an ECG signal encoder with a text encoder using captioning and contrastive losses for unified representations. This architecture learns from the alignments between detailed textual prompts and the corresponding ECG waveform data, which aims to capture nuanced clinical insights for enhanced diagnostic tasks.
  • Figure 4: Comparison of the proposed ECG Semantics Integrator (ESI) with the best performances from baseline methods including the supervised models and signal-focused self-supervised learning (SSL) pretrained models. Compared to the baselines, ESI is a multimodal contrastive pretraining framework that leverages both ECG signals and corresponding textual descriptions to learn enhanced ECG representations. The evaluations of arrhythmia diagnosis and identification are conducted on datasets including PTB-XL and ICBEB, with metrics of area under the ROC curve (AUC) and accuracy (ACC).
  • Figure 5: Performances of linear probing inference in arrhythmia diagnosis (AUC) on PTB-XL and ICBEB data using the pretrained encoders with varying training samples.
  • ...and 2 more figures