Table of Contents
Fetching ...

ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

Jiarui Jin, Haoyu Wang, Xingliang Wu, Xiaocheng Fang, Xiang Lan, Zihan Wang, Deyun Zhang, Bo Liu, Yingying Zhang, Xian Wu, Hongyan Li, Shenda Hong

TL;DR

This paper tackles the reliability gap in multimodal large language models for ECG interpretation, where existing models frequently produce plausible but clinically incorrect analyses. It introduces ECG-R1, a reasoning MLLM that grounds ECG interpretation in measurable physiological features via Protocol-Guided Instruction Data Generation, employs a modality-decoupled architecture with Interleaved Modality Dropout (IMD) to enhance robustness under missing data, and strengthens evidence-based reasoning through Reinforcement Learning with ECG Diagnostic Evidence Rewards (EDER). The approach includes a two-stage training pipeline (supervised fine-tuning followed by RL) and a theoretically grounded IMD framework that provides guarantees on robustness and cross-modal consistency. Extensive experiments compare ECG-R1 against proprietary, open-source, and ECG-specialized MLLMs, including licensed cardiologist evaluation, demonstrating improved diagnosis accuracy, richer evidence grounding, and stable performance under modality missing conditions. These findings suggest meaningful progress toward reliable, clinically aligned ECG interpretation, with public datasets and code enabling further research while underscoring the need for cautious clinical deployment and verification in real-world settings.

Abstract

Electrocardiography (ECG) serves as an indispensable diagnostic tool in clinical practice, yet existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation, often producing plausible but clinically incorrect analyses. To address this, we propose ECG-R1, the first reasoning MLLM designed for reliable ECG interpretation via three innovations. First, we construct the interpretation corpus using \textit{Protocol-Guided Instruction Data Generation}, grounding interpretation in measurable ECG features and monograph-defined quantitative thresholds and diagnostic logic. Second, we present a modality-decoupled architecture with \textit{Interleaved Modality Dropout} to improve robustness and cross-modal consistency when either the ECG signal or ECG image is missing. Third, we present \textit{Reinforcement Learning with ECG Diagnostic Evidence Rewards} to strengthen evidence-grounded ECG interpretation. Additionally, we systematically evaluate the ECG interpretation capabilities of proprietary, open-source, and medical MLLMs, and provide the first quantitative evidence that severe hallucinations are widespread, suggesting that the public should not directly trust these outputs without independent verification. Code and data are publicly available at \href{https://github.com/PKUDigitalHealth/ECG-R1}{here}, and an online platform can be accessed at \href{http://ai.heartvoice.com.cn/ECG-R1/}{here}.

ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

TL;DR

This paper tackles the reliability gap in multimodal large language models for ECG interpretation, where existing models frequently produce plausible but clinically incorrect analyses. It introduces ECG-R1, a reasoning MLLM that grounds ECG interpretation in measurable physiological features via Protocol-Guided Instruction Data Generation, employs a modality-decoupled architecture with Interleaved Modality Dropout (IMD) to enhance robustness under missing data, and strengthens evidence-based reasoning through Reinforcement Learning with ECG Diagnostic Evidence Rewards (EDER). The approach includes a two-stage training pipeline (supervised fine-tuning followed by RL) and a theoretically grounded IMD framework that provides guarantees on robustness and cross-modal consistency. Extensive experiments compare ECG-R1 against proprietary, open-source, and ECG-specialized MLLMs, including licensed cardiologist evaluation, demonstrating improved diagnosis accuracy, richer evidence grounding, and stable performance under modality missing conditions. These findings suggest meaningful progress toward reliable, clinically aligned ECG interpretation, with public datasets and code enabling further research while underscoring the need for cautious clinical deployment and verification in real-world settings.

Abstract

Electrocardiography (ECG) serves as an indispensable diagnostic tool in clinical practice, yet existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation, often producing plausible but clinically incorrect analyses. To address this, we propose ECG-R1, the first reasoning MLLM designed for reliable ECG interpretation via three innovations. First, we construct the interpretation corpus using \textit{Protocol-Guided Instruction Data Generation}, grounding interpretation in measurable ECG features and monograph-defined quantitative thresholds and diagnostic logic. Second, we present a modality-decoupled architecture with \textit{Interleaved Modality Dropout} to improve robustness and cross-modal consistency when either the ECG signal or ECG image is missing. Third, we present \textit{Reinforcement Learning with ECG Diagnostic Evidence Rewards} to strengthen evidence-grounded ECG interpretation. Additionally, we systematically evaluate the ECG interpretation capabilities of proprietary, open-source, and medical MLLMs, and provide the first quantitative evidence that severe hallucinations are widespread, suggesting that the public should not directly trust these outputs without independent verification. Code and data are publicly available at \href{https://github.com/PKUDigitalHealth/ECG-R1}{here}, and an online platform can be accessed at \href{http://ai.heartvoice.com.cn/ECG-R1/}{here}.
Paper Structure (59 sections, 5 theorems, 35 equations, 11 figures, 7 tables)

This paper contains 59 sections, 5 theorems, 35 equations, 11 figures, 7 tables.

Key Result

Theorem 2.2

Under Assumption assump:coverage_main, $R_{\max}(\theta)\le \alpha^{-1}R_q(\theta)$, where in our implementation $\alpha=\min\{p_d/2,\,(1-p_d)p_s,\,(1-p_d)(1-p_s)\}$.

Figures (11)

  • Figure 1: Left: Attribute comparison among general/medical MLLMs, previous ECG-specialized MLLMs, and ECG-R1. General/medical MLLMs typically cannot perform signal analysis and lack high-quality ECG interpretation corpora, which often leads to hallucinated, clinically incorrect interpretations at test time. Previous ECG-specialized MLLMs often construct training corpus by purely prompting LLMs from ECG features, thereby introducing medical errors that render the corpus unreliable, and they are neither robust nor cross-modal consistent under modality missing. Right: ECG-R1 follows a monograph-defined protocol to generate structured, clinically aligned interpretations, remaining robust and cross-modal consistent under modality missing.
  • Figure 2: Framework of ECG-R1. Instruction generation builds a protocol-guided interpretation corpus by combining ECG grounding features with the monograph protocol. Architecture adopts a decoupled dual-encoder design with lightweight projectors to align modality-specific representations into a shared LLM space. Training follows a two-stage strategy with SFT followed by RL, and integrates IMD to enhance robustness and cross-modal consistency under modality missing.
  • Figure 3: Architecture Comparison of GEM and ECG-R1.
  • Figure 4: Modality Missing Results between Time-Series and Image Modalities.
  • Figure 5: Qualitative Comparison of ECG-Grounding and our ECG Protocol-Guided Grounding CoT.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Theorem 2.2: Robustness under IMD
  • Theorem 2.3: Consistency via excess risk
  • Lemma 4.1: Cross-entropy decomposition
  • proof
  • Lemma 4.2: Pinsker's inequality
  • Lemma 4.3: Jensen for square root