Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs
Che Liu, Cheng Ouyang, Zhongwei Wan, Haozhe Wang, Wenjia Bai, Rossella Arcucci
TL;DR
K-MERL tackles the practical challenges of ECG multimodal learning with arbitrary lead inputs and noisy free-text alignment by incorporating structured cardiac knowledge mined from free-text ECG reports. The method employs lead-specific tokenization and spatial-temporal embeddings, along with dynamic lead masking and lead-independent segment masking, to capture lead-wise spatial-temporal patterns while handling variable lead availability. It introduces a knowledge-enhanced training objective that aligns ECG features with cardiac-related entities through a cardiac query network and a dual loss, combining a contrastive ECG-text objective with entity supervision. Across six public datasets, K-MERL achieves state-of-the-art zero-shot classification and linear probing, with notable robustness to partial-lead inputs and substantial improvements over prior multimodal approaches without relying on prompt engineering.
Abstract
Recent advances in multimodal ECG representation learning center on aligning ECG signals with paired free-text reports. However, suboptimal alignment persists due to the complexity of medical language and the reliance on a full 12-lead setup, which is often unavailable in under-resourced settings. To tackle these issues, we propose **K-MERL**, a knowledge-enhanced multimodal ECG representation learning framework. **K-MERL** leverages large language models to extract structured knowledge from free-text reports and employs a lead-aware ECG encoder with dynamic lead masking to accommodate arbitrary lead inputs. Evaluations on six external ECG datasets show that **K-MERL** achieves state-of-the-art performance in zero-shot classification and linear probing tasks, while delivering an average **16%** AUC improvement over existing methods in partial-lead zero-shot classification.
