Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning
Jialu Tang, Tong Xia, Yuan Lu, Cecilia Mascolo, Aaqib Saeed
TL;DR
This work tackles few-shot ECG-language QA by unifying a self-supervised ECG encoder with a frozen large language model through a trainable multimodal fusion mapper, enabling rapid adaptation to new diagnostic questions with limited labeled data. Framed within a meta-learning paradigm (MAML), the approach optimizes for fast task adaptation across diverse question types and ECG attributes, and is demonstrated to be LLM-agnostic by evaluating with multiple backbones. A novel ECG-QA benchmark underpins the evaluation, incorporating cross-domain tasks from PTB-XL and MIMIC-IV-ECG and providing a rigorous testbed for unseen attribute-answer pairs. Results show strong generalization to unseen tasks, notable robustness to lead reductions and language variations, and clear gains from larger LLMs, motivating practical deployment in data-constrained clinical settings and extensions to multi-modal diagnostics. Overall, the framework bridges physiological signal processing with nuanced natural language reasoning to improve ECG interpretation under real-world data limitations.
Abstract
Electrocardiogram (ECG) interpretation requires specialized expertise, often involving synthesizing insights from ECG signals with complex clinical queries posed in natural language. The scarcity of labeled ECG data coupled with the diverse nature of clinical inquiries presents a significant challenge for developing robust and adaptable ECG diagnostic systems. This work introduces a novel multimodal meta-learning method for few-shot ECG question answering, addressing the challenge of limited labeled data while leveraging the rich knowledge encoded within large language models (LLMs). Our LLM-agnostic approach integrates a pre-trained ECG encoder with a frozen LLM (e.g., LLaMA and Gemma) via a trainable fusion module, enabling the language model to reason about ECG data and generate clinically meaningful answers. Extensive experiments demonstrate superior generalization to unseen diagnostic tasks compared to supervised baselines, achieving notable performance even with limited ECG leads. For instance, in a 5-way 5-shot setting, our method using LLaMA-3.1-8B achieves an accuracy of 84.6%, 77.3%, and 69.6% on single verify, choose and query question types, respectively. These results highlight the potential of our method to enhance clinical ECG interpretation by combining signal processing with the nuanced language understanding capabilities of LLMs, particularly in data-constrained scenarios.
