Table of Contents
Fetching ...

Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs

Che Liu, Cheng Ouyang, Zhongwei Wan, Haozhe Wang, Wenjia Bai, Rossella Arcucci

TL;DR

K-MERL tackles the practical challenges of ECG multimodal learning with arbitrary lead inputs and noisy free-text alignment by incorporating structured cardiac knowledge mined from free-text ECG reports. The method employs lead-specific tokenization and spatial-temporal embeddings, along with dynamic lead masking and lead-independent segment masking, to capture lead-wise spatial-temporal patterns while handling variable lead availability. It introduces a knowledge-enhanced training objective that aligns ECG features with cardiac-related entities through a cardiac query network and a dual loss, combining a contrastive ECG-text objective with entity supervision. Across six public datasets, K-MERL achieves state-of-the-art zero-shot classification and linear probing, with notable robustness to partial-lead inputs and substantial improvements over prior multimodal approaches without relying on prompt engineering.

Abstract

Recent advances in multimodal ECG representation learning center on aligning ECG signals with paired free-text reports. However, suboptimal alignment persists due to the complexity of medical language and the reliance on a full 12-lead setup, which is often unavailable in under-resourced settings. To tackle these issues, we propose **K-MERL**, a knowledge-enhanced multimodal ECG representation learning framework. **K-MERL** leverages large language models to extract structured knowledge from free-text reports and employs a lead-aware ECG encoder with dynamic lead masking to accommodate arbitrary lead inputs. Evaluations on six external ECG datasets show that **K-MERL** achieves state-of-the-art performance in zero-shot classification and linear probing tasks, while delivering an average **16%** AUC improvement over existing methods in partial-lead zero-shot classification.

Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs

TL;DR

K-MERL tackles the practical challenges of ECG multimodal learning with arbitrary lead inputs and noisy free-text alignment by incorporating structured cardiac knowledge mined from free-text ECG reports. The method employs lead-specific tokenization and spatial-temporal embeddings, along with dynamic lead masking and lead-independent segment masking, to capture lead-wise spatial-temporal patterns while handling variable lead availability. It introduces a knowledge-enhanced training objective that aligns ECG features with cardiac-related entities through a cardiac query network and a dual loss, combining a contrastive ECG-text objective with entity supervision. Across six public datasets, K-MERL achieves state-of-the-art zero-shot classification and linear probing, with notable robustness to partial-lead inputs and substantial improvements over prior multimodal approaches without relying on prompt engineering.

Abstract

Recent advances in multimodal ECG representation learning center on aligning ECG signals with paired free-text reports. However, suboptimal alignment persists due to the complexity of medical language and the reliance on a full 12-lead setup, which is often unavailable in under-resourced settings. To tackle these issues, we propose **K-MERL**, a knowledge-enhanced multimodal ECG representation learning framework. **K-MERL** leverages large language models to extract structured knowledge from free-text reports and employs a lead-aware ECG encoder with dynamic lead masking to accommodate arbitrary lead inputs. Evaluations on six external ECG datasets show that **K-MERL** achieves state-of-the-art performance in zero-shot classification and linear probing tasks, while delivering an average **16%** AUC improvement over existing methods in partial-lead zero-shot classification.

Paper Structure

This paper contains 28 sections, 5 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Comparison between classical ECG multimodal learning and our K-MERL framework. (a): The classical approaches (e.g., MERL merl) are suboptimal: they processes all leads in a lead-agnostic manner and naively align ECG signals directly free-text reports. (b): K-MERL introduces lead-specific processing and lead & segment masking to capture spatial-temporal patterns unique to each lead. It also extracts cardiac-related entities from reports as structured knowledge and aligns them with ECG features to enhance multimodal learning, thereby reducing the complexity introduced by the grammatical structure of free-text reports.
  • Figure 2: Illustration of our lead-specific processing and handling of partial leads input in K-MERL. (a): Lead-specific processing and masking during pre-training. The model employs lead-specific tokenization, spatial embeddings, and lead-agnostic temporal embeddings to capture spatial-temporal patterns for each lead (see Sec \ref{['sec: framework']}). Dynamic lead masking is used to simulate inputs with arbitrary combinations of leads, while segment masking encourage the framework to captures temporal patterns (see Sec \ref{['sec: masking']}). (b): Handling partial lead input during downstream tasks. When leads are missing, the model processes only the available leads using lead-specific embeddings, allowing maintained performance even with incomplete data.
  • Figure 3: Illustration of mining structured knowledge from free-text reports (see Sec \ref{['sec: mining condition']}). First, cardiac-related entities are extracted from free-text ECG reports using an open-source LLM (e.g., Llama3.1-70B-Instruct). Next, we query the LLM to merge duplicated or synonymous cardiac-related entities into a list of unique names. Finally, the LLM detects and aggregates subtypes into their respective superclasses, creating a structured hierarchy of cardiac-related entities.
  • Figure 4: Performance on zero-shot classification across six datasets, comparing K-MERL with previous ECG multimodal learning methods. Notably, we use the original disease category names as prompts for both K-MERL and MERL to ensure a fair comparison.
  • Figure 5: Comparison of K-MERL and MERL on seen and unseen classes, reporting (a) Average AUC and (b) Average F1 scores. Definitions are in Sec \ref{['sec: zeroshot cls']}.
  • ...and 4 more figures