Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, Rossella Arcucci
TL;DR
This paper tackles label-efficient ECG classification by fusing ECG signals with associated clinical reports through multimodal learning, enabling zero-shot classification. It introduces Cross-Modal Alignment (CMA) and Uni-Modal Alignment (UMA) during training and Clinical Knowledge Enhanced Prompt Engineering (CKEPE) at test time to exploit expert-verified knowledge databases for prompt generation, with the overall objective $\mathcal{L}_{\mathrm{total}} = \mathcal{L}_{\mathrm{CMA}} + \mathcal{L}_{\mathrm{UMA}}$. Pretraining on the large MIMIC-ECG dataset yields transferable ECG representations, achieving an average zero-shot AUC of $75.2\%$ across six datasets, outperforming linear-probed eSSL with $10\%$ data by $3.2\%$. The work also provides the first public benchmark comparing MERL to 10 eSSL methods across diverse ECG datasets and emphasizes the importance of latent augmentation, robust text encoders, and verified clinical knowledge in improving zero-shot performance.
Abstract
Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) methods show promise in representation learning from unannotated ECG data, they often overlook the clinical knowledge that can be found in reports. This oversight and the requirement for annotated samples for downstream tasks limit eSSL's versatility. In this work, we address these issues with the Multimodal ECG Representation Learning (MERL}) framework. Through multimodal learning on ECG records and associated reports, MERL is capable of performing zero-shot ECG classification with text prompts, eliminating the need for training data in downstream tasks. At test time, we propose the Clinical Knowledge Enhanced Prompt Engineering (CKEPE) approach, which uses Large Language Models (LLMs) to exploit external expert-verified clinical knowledge databases, generating more descriptive prompts and reducing hallucinations in LLM-generated content to boost zero-shot classification. Based on MERL, we perform the first benchmark across six public ECG datasets, showing the superior performance of MERL compared against eSSL methods. Notably, MERL achieves an average AUC score of 75.2% in zero-shot classification (without training data), 3.2% higher than linear probed eSSL methods with 10\% annotated training data, averaged across all six datasets. Code and models are available at https://github.com/cheliu-computation/MERL
