Table of Contents
Fetching ...

Heartcare Suite: A Unified Multimodal ECG Suite for Dual Signal-Image Modeling and Understanding

Yihan Xie, Sijing Li, Tianwei Lin, Zhuonan Wang, Chenglin Yang, Yu Zhong, Wenjie Yan, Wenqiao Zhang, Xiaogang Guo, Jun Xiao, Yueting Zhuang, Beng Chin Ooi

TL;DR

Heartcare Suite tackles the challenge of cross-modal semantic alignment for ECG by treating dual signal and image streams as a unified modality. It introduces Heartcare-400K, Heartcare-Bench, and HeartcareGPT with Beat tokenization and DSPA to map signals, images, and text into a shared representation space. The authors show consistent improvements across diverse ECG understanding tasks and provide a scalable data-engineering and evaluation framework. This work lays a foundation for extending Med-MLLMs to physiological signal domains with clinically grounded reasoning.

Abstract

Although electrocardiograms (ECG) play a dominant role in cardiovascular diagnosis and treatment, their intrinsic data forms and representational patterns pose significant challenges for medical multimodal large language models (Med-MLLMs) in achieving cross-modal semantic alignment. To address this gap, we propose Heartcare Suite, a unified ECG suite designed for dual signal-image modeling and understanding. (i) Heartcare-400K: We build a finegrained ECG instruction dataset on top of our data pipeline engine--HeartAgent--by integrating 12,170 high quality clinical ECG reports from top hospitals with open-source data; (ii) Heartcare-Bench: a systematic benchmark assessing performance of models in multi-perspective ECG understanding and cross-modal generalization, providing guidance for optimizing ECG comprehension models; (iii) HeartcareGPT: built upon a structure-aware discrete tokenizer Beat, we propose the DSPA (Dual Stream Projection Alignment) paradigm--a dual encoder projection alignment mechanism enabling joint optimizing and modeling native ECG signal-image within a shared feature space. Heartcare achieves consistent improvements across diverse ECG understanding tasks, validating both the effectiveness of the unified modeling paradigm and the necessity of a high-quality data pipeline, and establishing a methodological foundation for extending Med-MLLMs toward physiological signal domains. Our project is available at https://github.com/DCDmllm/Heartcare-Suite .

Heartcare Suite: A Unified Multimodal ECG Suite for Dual Signal-Image Modeling and Understanding

TL;DR

Heartcare Suite tackles the challenge of cross-modal semantic alignment for ECG by treating dual signal and image streams as a unified modality. It introduces Heartcare-400K, Heartcare-Bench, and HeartcareGPT with Beat tokenization and DSPA to map signals, images, and text into a shared representation space. The authors show consistent improvements across diverse ECG understanding tasks and provide a scalable data-engineering and evaluation framework. This work lays a foundation for extending Med-MLLMs to physiological signal domains with clinically grounded reasoning.

Abstract

Although electrocardiograms (ECG) play a dominant role in cardiovascular diagnosis and treatment, their intrinsic data forms and representational patterns pose significant challenges for medical multimodal large language models (Med-MLLMs) in achieving cross-modal semantic alignment. To address this gap, we propose Heartcare Suite, a unified ECG suite designed for dual signal-image modeling and understanding. (i) Heartcare-400K: We build a finegrained ECG instruction dataset on top of our data pipeline engine--HeartAgent--by integrating 12,170 high quality clinical ECG reports from top hospitals with open-source data; (ii) Heartcare-Bench: a systematic benchmark assessing performance of models in multi-perspective ECG understanding and cross-modal generalization, providing guidance for optimizing ECG comprehension models; (iii) HeartcareGPT: built upon a structure-aware discrete tokenizer Beat, we propose the DSPA (Dual Stream Projection Alignment) paradigm--a dual encoder projection alignment mechanism enabling joint optimizing and modeling native ECG signal-image within a shared feature space. Heartcare achieves consistent improvements across diverse ECG understanding tasks, validating both the effectiveness of the unified modeling paradigm and the necessity of a high-quality data pipeline, and establishing a methodological foundation for extending Med-MLLMs toward physiological signal domains. Our project is available at https://github.com/DCDmllm/Heartcare-Suite .

Paper Structure

This paper contains 25 sections, 11 equations, 11 figures, 18 tables.

Figures (11)

  • Figure 1: The proposed Heartcare-400K dataset. Heartcare-400K aggregates real-world ECG data, supporting Closed-QA, Open-QA, Report Generation and Signal Prediction.
  • Figure 2: Framework of multimodal data engine for QA generation.
  • Figure 3: Architecture of HeartcareGPT. The dual-form ECG inputs are routed and encoded with modality-specific expert projections aligned to the LLM backbone. The unified autoregressive architecture efficiently supports interleaved and joint modeling of ECG multimodal inputs.
  • Figure 4: Results of ablation studies on training pipeline.
  • Figure 5: (a) Results of ablation studies on multimodal integration. (b) Expert Preference Distribution Across Models. Inner-ring results are based on Open-QA evaluations; outer-ring results are based on report generation evaluations.
  • ...and 6 more figures