Table of Contents
Fetching ...

Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, Rossella Arcucci

TL;DR

This paper tackles label-efficient ECG classification by fusing ECG signals with associated clinical reports through multimodal learning, enabling zero-shot classification. It introduces Cross-Modal Alignment (CMA) and Uni-Modal Alignment (UMA) during training and Clinical Knowledge Enhanced Prompt Engineering (CKEPE) at test time to exploit expert-verified knowledge databases for prompt generation, with the overall objective $\mathcal{L}_{\mathrm{total}} = \mathcal{L}_{\mathrm{CMA}} + \mathcal{L}_{\mathrm{UMA}}$. Pretraining on the large MIMIC-ECG dataset yields transferable ECG representations, achieving an average zero-shot AUC of $75.2\%$ across six datasets, outperforming linear-probed eSSL with $10\%$ data by $3.2\%$. The work also provides the first public benchmark comparing MERL to 10 eSSL methods across diverse ECG datasets and emphasizes the importance of latent augmentation, robust text encoders, and verified clinical knowledge in improving zero-shot performance.

Abstract

Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) methods show promise in representation learning from unannotated ECG data, they often overlook the clinical knowledge that can be found in reports. This oversight and the requirement for annotated samples for downstream tasks limit eSSL's versatility. In this work, we address these issues with the Multimodal ECG Representation Learning (MERL}) framework. Through multimodal learning on ECG records and associated reports, MERL is capable of performing zero-shot ECG classification with text prompts, eliminating the need for training data in downstream tasks. At test time, we propose the Clinical Knowledge Enhanced Prompt Engineering (CKEPE) approach, which uses Large Language Models (LLMs) to exploit external expert-verified clinical knowledge databases, generating more descriptive prompts and reducing hallucinations in LLM-generated content to boost zero-shot classification. Based on MERL, we perform the first benchmark across six public ECG datasets, showing the superior performance of MERL compared against eSSL methods. Notably, MERL achieves an average AUC score of 75.2% in zero-shot classification (without training data), 3.2% higher than linear probed eSSL methods with 10\% annotated training data, averaged across all six datasets. Code and models are available at https://github.com/cheliu-computation/MERL

Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

TL;DR

This paper tackles label-efficient ECG classification by fusing ECG signals with associated clinical reports through multimodal learning, enabling zero-shot classification. It introduces Cross-Modal Alignment (CMA) and Uni-Modal Alignment (UMA) during training and Clinical Knowledge Enhanced Prompt Engineering (CKEPE) at test time to exploit expert-verified knowledge databases for prompt generation, with the overall objective . Pretraining on the large MIMIC-ECG dataset yields transferable ECG representations, achieving an average zero-shot AUC of across six datasets, outperforming linear-probed eSSL with data by . The work also provides the first public benchmark comparing MERL to 10 eSSL methods across diverse ECG datasets and emphasizes the importance of latent augmentation, robust text encoders, and verified clinical knowledge in improving zero-shot performance.

Abstract

Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) methods show promise in representation learning from unannotated ECG data, they often overlook the clinical knowledge that can be found in reports. This oversight and the requirement for annotated samples for downstream tasks limit eSSL's versatility. In this work, we address these issues with the Multimodal ECG Representation Learning (MERL}) framework. Through multimodal learning on ECG records and associated reports, MERL is capable of performing zero-shot ECG classification with text prompts, eliminating the need for training data in downstream tasks. At test time, we propose the Clinical Knowledge Enhanced Prompt Engineering (CKEPE) approach, which uses Large Language Models (LLMs) to exploit external expert-verified clinical knowledge databases, generating more descriptive prompts and reducing hallucinations in LLM-generated content to boost zero-shot classification. Based on MERL, we perform the first benchmark across six public ECG datasets, showing the superior performance of MERL compared against eSSL methods. Notably, MERL achieves an average AUC score of 75.2% in zero-shot classification (without training data), 3.2% higher than linear probed eSSL methods with 10\% annotated training data, averaged across all six datasets. Code and models are available at https://github.com/cheliu-computation/MERL
Paper Structure (22 sections, 3 equations, 6 figures, 13 tables)

This paper contains 22 sections, 3 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: We demonstrate MERL, even without training samples and prompt engineering, surpasses the best-performing eSSL with 1% data linear probing from Tab. \ref{['tab:linear-cls']}. Additionally, zero-shot MERL enhanced with our CKEPE outperforms the best eSSL results obtained from 10% data linear probing.
  • Figure 2: (a) Commonly used naive input-level data augmentation distorts semantics of ECG records, leading to sub-optimal representation learning performance. (b) Illustration of existing eSSL approaches. Their contrastive learning framework necessitates these naively augmented ECG signals. (c) Existing generative eSSL employs signal reconstruction as self-supervision task while being agnostic to the semantic meaning of ECG. (d) The proposed MERL, designed for multimodal ECG learning, leverages both ECG records and clinical reports for representation learning through Cross-Modal Alignment (CMA). MERL addresses the drawbacks of naive input-level augmentation by opting for latent augmentation (dropout) to prevent pattern corruption, and it enhances ECG learning through Uni-Modal Alignment (UMA). $\mathcal{F}_{e}$ denotes the ECG encoder, and $\mathcal{F}_r$ represents the text report encoder.
  • Figure 3: At test time, we design the CKEPE pipeline for generating more descriptive prompts via LLM for zero-shot classification. In particular, we leverage the capability of LLM to extract clinical knowledge from trustworthy external knowledge databases verified by clinicians, then restructure this knowledge (e.g., subtypes or attributes of cardiac conditions) for prompt generation, with less hallucination from LLM.
  • Figure 4: Left: Zero-shot MERL vs. linear probed eSSL with 1% Data. Right: Zero-shot MERL vs. linear probed eSSL with 10% Data. All performance are reported in the AUC score.
  • Figure 5: Average linear probing performance on six datasets of MERL and other eSSL methods with scaled ECG backbones. For ST-MEM, a transformer-based method, we use ViT-Tiny, ViT-Small, and ViT-Base.
  • ...and 1 more figures