Table of Contents
Fetching ...

Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang

TL;DR

The experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%, which highlights the potential of PULSE to enhance ECG interpretation in clinical practice.

Abstract

The electrocardiogram (ECG) is an essential non-invasive diagnostic tool for assessing cardiac conditions. Existing automatic interpretation methods suffer from limited generalizability, focusing on a narrow range of cardiac conditions, and typically depend on raw physiological signals, which may not be readily available in resource-limited settings where only printed or digital ECG images are accessible. Recent advancements in multimodal large language models (MLLMs) present promising opportunities for addressing these challenges. However, the application of MLLMs to ECG image interpretation remains challenging due to the lack of instruction tuning datasets and well-established ECG image benchmarks for quantitative evaluation. To address these challenges, we introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples, covering a wide range of ECG-related tasks from diverse data sources. Using ECGInstruct, we develop PULSE, an MLLM tailored for ECG image comprehension. In addition, we curate ECGBench, a new evaluation benchmark covering four key ECG image interpretation tasks across nine different datasets. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%. This work highlights the potential of PULSE to enhance ECG interpretation in clinical practice.

Teach Multimodal LLMs to Comprehend Electrocardiographic Images

TL;DR

The experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%, which highlights the potential of PULSE to enhance ECG interpretation in clinical practice.

Abstract

The electrocardiogram (ECG) is an essential non-invasive diagnostic tool for assessing cardiac conditions. Existing automatic interpretation methods suffer from limited generalizability, focusing on a narrow range of cardiac conditions, and typically depend on raw physiological signals, which may not be readily available in resource-limited settings where only printed or digital ECG images are accessible. Recent advancements in multimodal large language models (MLLMs) present promising opportunities for addressing these challenges. However, the application of MLLMs to ECG image interpretation remains challenging due to the lack of instruction tuning datasets and well-established ECG image benchmarks for quantitative evaluation. To address these challenges, we introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples, covering a wide range of ECG-related tasks from diverse data sources. Using ECGInstruct, we develop PULSE, an MLLM tailored for ECG image comprehension. In addition, we curate ECGBench, a new evaluation benchmark covering four key ECG image interpretation tasks across nine different datasets. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%. This work highlights the potential of PULSE to enhance ECG interpretation in clinical practice.

Paper Structure

This paper contains 29 sections, 18 figures, 7 tables.

Figures (18)

  • Figure 1: The proposed PULSE demonstrates superior performance across multiple in-domain and out-of-domain datasets on our constructed ECGBench compared with advanced proprietary MLLMs (e.g., GPT-4o). Notably, the proprietary MLLMs often fail to accurately interpret ECG images, generating well-structured and contextually relevant responses but ultimately incorrect (with errors highlighted in red) compared to the ground truth diagnosis.
  • Figure 2: ECGInstruct: a list of diverse and large-scale instruction tuning datasets for ECG image interpretation. (1) ECG images are synthesized from raw signal recordings with various distortions that mimic real-world printed ECG images. (2) ECGInstruct is curated based on clinician-defined ECG-related tasks, original diagnosis and clinical reports, and diverse task types. Additional quality checking is applied to filter lower-scored instructions.
  • Figure 3: The data curation process for ECGBench. There are four key tasks involved: (1) two repurposed tasks (abnormality detection and report generation) derived from existing ECG datasets, where ECG images are synthesized from raw signals, and queries/answers are extracted based on diagnostic and clinical reports; (2) Two newly developed tasks using external resources, where ECG images and associated questions and answers are collected and generated from real-world sources.
  • Figure 4: Score breakdown of report generation performance.
  • Figure A1: The Examples of basic feature recognition instructions for finetuning PULSE.
  • ...and 13 more figures