Table of Contents
Fetching ...

MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Jing Xiong, Rossella Arcucci, Huaxiu Yao, Mi Zhang

TL;DR

This work tackles automated ECG report generation by introducing MEIT, a multimodal ECG instruction-tuning framework that couples a lightweight ECG encoder with LLMs through a concise ECG-text alignment in self-attention. It creates a diverse instruction-tuning dataset from PTB-XL and MIMIC-IV-ECG, trains with LoRA adapters while freezing the LLM backbone, and evaluates across a four-task benchmark using nine open-source LLMs. Across two large ECG datasets, MEIT demonstrates superior report quality, robust zero-shot cross-domain transfer, and alignment with expert annotations, while remaining efficient through a lightweight fusion design. The study provides a comprehensive benchmark and a scalable pathway toward clinically reliable, instruction-following ECG report generation, highlighting practical potential and avenues for future enhancements such as retrieval-augmented generation and cross-domain knowledge integration.

Abstract

Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions. To facilitate future research, we establish a benchmark to evaluate MEIT with various LLMs backbones across two large-scale ECG datasets. Our approach uniquely aligns the representations of the ECG signal and the report, and we conduct extensive experiments to benchmark MEIT with nine open-source LLMs using more than 800,000 ECG reports. MEIT's results underscore the superior performance of instruction-tuned LLMs, showcasing their proficiency in quality report generation, zero-shot capabilities, resilience to signal perturbation, and alignment with human expert evaluation. These findings emphasize the efficacy of MEIT and its potential for real-world clinical application.

MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

TL;DR

This work tackles automated ECG report generation by introducing MEIT, a multimodal ECG instruction-tuning framework that couples a lightweight ECG encoder with LLMs through a concise ECG-text alignment in self-attention. It creates a diverse instruction-tuning dataset from PTB-XL and MIMIC-IV-ECG, trains with LoRA adapters while freezing the LLM backbone, and evaluates across a four-task benchmark using nine open-source LLMs. Across two large ECG datasets, MEIT demonstrates superior report quality, robust zero-shot cross-domain transfer, and alignment with expert annotations, while remaining efficient through a lightweight fusion design. The study provides a comprehensive benchmark and a scalable pathway toward clinically reliable, instruction-following ECG report generation, highlighting practical potential and avenues for future enhancements such as retrieval-augmented generation and cross-domain knowledge integration.

Abstract

Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions. To facilitate future research, we establish a benchmark to evaluate MEIT with various LLMs backbones across two large-scale ECG datasets. Our approach uniquely aligns the representations of the ECG signal and the report, and we conduct extensive experiments to benchmark MEIT with nine open-source LLMs using more than 800,000 ECG reports. MEIT's results underscore the superior performance of instruction-tuned LLMs, showcasing their proficiency in quality report generation, zero-shot capabilities, resilience to signal perturbation, and alignment with human expert evaluation. These findings emphasize the efficacy of MEIT and its potential for real-world clinical application.
Paper Structure (32 sections, 6 equations, 10 figures, 15 tables)

This paper contains 32 sections, 6 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: (a) Overview of MEIT; (b) Illustration of the architecture of Report Generator.
  • Figure 2: Zero-shot performance on PTB-XL dataset. "IT" denotes instruction tuning.
  • Figure 3: Signal perturbation robustness analysis on various LLMs.
  • Figure 4: Visualizations of instruction tuning loss and METEOR score.
  • Figure 5: Model scaling performance on MIMIC-IV-ECG and PTB-XL.
  • ...and 5 more figures