ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation
Jiarui Jin, Haoyu Wang, Xingliang Wu, Xiaocheng Fang, Xiang Lan, Zihan Wang, Deyun Zhang, Bo Liu, Yingying Zhang, Xian Wu, Hongyan Li, Shenda Hong
TL;DR
This paper tackles the reliability gap in multimodal large language models for ECG interpretation, where existing models frequently produce plausible but clinically incorrect analyses. It introduces ECG-R1, a reasoning MLLM that grounds ECG interpretation in measurable physiological features via Protocol-Guided Instruction Data Generation, employs a modality-decoupled architecture with Interleaved Modality Dropout (IMD) to enhance robustness under missing data, and strengthens evidence-based reasoning through Reinforcement Learning with ECG Diagnostic Evidence Rewards (EDER). The approach includes a two-stage training pipeline (supervised fine-tuning followed by RL) and a theoretically grounded IMD framework that provides guarantees on robustness and cross-modal consistency. Extensive experiments compare ECG-R1 against proprietary, open-source, and ECG-specialized MLLMs, including licensed cardiologist evaluation, demonstrating improved diagnosis accuracy, richer evidence grounding, and stable performance under modality missing conditions. These findings suggest meaningful progress toward reliable, clinically aligned ECG interpretation, with public datasets and code enabling further research while underscoring the need for cautious clinical deployment and verification in real-world settings.
Abstract
Electrocardiography (ECG) serves as an indispensable diagnostic tool in clinical practice, yet existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation, often producing plausible but clinically incorrect analyses. To address this, we propose ECG-R1, the first reasoning MLLM designed for reliable ECG interpretation via three innovations. First, we construct the interpretation corpus using \textit{Protocol-Guided Instruction Data Generation}, grounding interpretation in measurable ECG features and monograph-defined quantitative thresholds and diagnostic logic. Second, we present a modality-decoupled architecture with \textit{Interleaved Modality Dropout} to improve robustness and cross-modal consistency when either the ECG signal or ECG image is missing. Third, we present \textit{Reinforcement Learning with ECG Diagnostic Evidence Rewards} to strengthen evidence-grounded ECG interpretation. Additionally, we systematically evaluate the ECG interpretation capabilities of proprietary, open-source, and medical MLLMs, and provide the first quantitative evidence that severe hallucinations are widespread, suggesting that the public should not directly trust these outputs without independent verification. Code and data are publicly available at \href{https://github.com/PKUDigitalHealth/ECG-R1}{here}, and an online platform can be accessed at \href{http://ai.heartvoice.com.cn/ECG-R1/}{here}.
