Table of Contents
Fetching ...

anyECG-chat: A Generalist ECG-MLLM for Flexible ECG Input and Multi-Task Understanding

Haitao Li, Ziyu Li, Yiheng Mao, Ziyi Liu, Zhoujian Sun, Zhengxing Huang

TL;DR

anyECG-chat introduces a generalist ECG-MLLM capable of handling dynamic-length, reduced-lead, and multi-ECG inputs to perform diverse tasks including report generation, fine-grained localization, and multi-ECG comparison. It relies on the anyECG dataset family and a three-stage curriculum that progressively aligns ECG perception with instruction-following in a LLaMA-based model augmented by LoRA adapters. The architecture couples a pre-trained ECG encoder with a modality connector and strategic input tokens to support dynamic ECG inputs, while the evaluation demonstrates robust cross-task performance, including zero-shot and multi-turn capabilities. This work provides a practical framework for flexible, interactive ECG understanding with potential impact on clinical workflow and home-monitoring contexts.

Abstract

The advent of multimodal large language models (MLLMs) has sparked interest in their application to electrocardiogram (ECG) analysis. However, existing ECG-focused MLLMs primarily focus on report generation tasks, often limited to single 12-lead, short-duration (10s) ECG inputs, thereby underutilizing the potential of MLLMs. To this end, we aim to develop a MLLM for ECG analysis that supports a broader range of tasks and more flexible ECG inputs. However, existing ECG-QA datasets are often monotonous. To address this gap, we first constructed the anyECG dataset, which encompasses a wide variety of tasks, including report generation, abnormal waveform localization, and open-ended question answering. In addition to standard hospital ECGs, we introduced long-duration reduced-lead ECGs for home environments and multiple ECG comparison scenarios commonly encountered in clinical practice. Furthermore, we propose the anyECG-chat model, which supports dynamic-length ECG inputs and multiple ECG inputs. We trained the model using a three-stage curriculum training recipe with the anyECG dataset. A comprehensive evaluation was conducted, demonstrating that anyECG-chat is capable of supporting various practical application scenarios, including not only common report generation tasks but also abnormal waveform localization for long-duration reduced-lead ECGs in home environments and comprehensive comparative analysis of multiple ECGs. Our code and data are available at: https://github.com/CuCl-2/anyECG-chat.

anyECG-chat: A Generalist ECG-MLLM for Flexible ECG Input and Multi-Task Understanding

TL;DR

anyECG-chat introduces a generalist ECG-MLLM capable of handling dynamic-length, reduced-lead, and multi-ECG inputs to perform diverse tasks including report generation, fine-grained localization, and multi-ECG comparison. It relies on the anyECG dataset family and a three-stage curriculum that progressively aligns ECG perception with instruction-following in a LLaMA-based model augmented by LoRA adapters. The architecture couples a pre-trained ECG encoder with a modality connector and strategic input tokens to support dynamic ECG inputs, while the evaluation demonstrates robust cross-task performance, including zero-shot and multi-turn capabilities. This work provides a practical framework for flexible, interactive ECG understanding with potential impact on clinical workflow and home-monitoring contexts.

Abstract

The advent of multimodal large language models (MLLMs) has sparked interest in their application to electrocardiogram (ECG) analysis. However, existing ECG-focused MLLMs primarily focus on report generation tasks, often limited to single 12-lead, short-duration (10s) ECG inputs, thereby underutilizing the potential of MLLMs. To this end, we aim to develop a MLLM for ECG analysis that supports a broader range of tasks and more flexible ECG inputs. However, existing ECG-QA datasets are often monotonous. To address this gap, we first constructed the anyECG dataset, which encompasses a wide variety of tasks, including report generation, abnormal waveform localization, and open-ended question answering. In addition to standard hospital ECGs, we introduced long-duration reduced-lead ECGs for home environments and multiple ECG comparison scenarios commonly encountered in clinical practice. Furthermore, we propose the anyECG-chat model, which supports dynamic-length ECG inputs and multiple ECG inputs. We trained the model using a three-stage curriculum training recipe with the anyECG dataset. A comprehensive evaluation was conducted, demonstrating that anyECG-chat is capable of supporting various practical application scenarios, including not only common report generation tasks but also abnormal waveform localization for long-duration reduced-lead ECGs in home environments and comprehensive comparative analysis of multiple ECGs. Our code and data are available at: https://github.com/CuCl-2/anyECG-chat.

Paper Structure

This paper contains 33 sections, 1 equation, 5 figures, 15 tables.

Figures (5)

  • Figure 1: The overview of anyECG-chat architecture.
  • Figure 2: Results of Localization and Zero-Shot Single Lead ECG Localization. Since LLaVa-Med, MEIT and PULSE failed to provide second-level localization, scoring 0, they are omitted from the figure.
  • Figure 3: The Score Distribution on MIMIC Multi-ECG QA.
  • Figure 4: Statistics of the MIMIC Multi-ECG QA dataset.
  • Figure 5: Comparison of LLM-based scoring and human scoring across different score levels.