Table of Contents
Fetching ...

Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework

William Han, Chaojing Duan, Zhepeng Cen, Yihang Yao, Xiaoyu Song, Atharva Mhaskar, Dylan Leong, Michael A. Rosenberg, Emerson Liu, Ding Zhao

TL;DR

This work addresses the question of which ECG input representation best suits autoregressive Electrocardiogram-Language Models (ELMs). It introduces a unified benchmark across 6 public ECG datasets and 5 evaluation metrics to compare raw signals, plotted images, and symbolic tokenizations, including the ECG-Byte-based end-to-end approach. The study finds that symbolic, tokenized representations ($X_{ ext{ID}}$) with End-to-End training consistently yield the most statistically significant improvements in text generation, robustness, and scaling, outperforming raw and image modalities. Ablations across LLM backbones, ECG length $L$, and sequence length $T$ illuminate trade-offs and guidance for practical deployment, with an open-source framework released to standardize future ELM research and development.

Abstract

Recent advances have increasingly applied large language models (LLMs) to electrocardiogram (ECG) interpretation, giving rise to Electrocardiogram-Language Models (ELMs). Conditioned on an ECG and a textual query, an ELM autoregressively generates a free-form textual response. Unlike traditional classification-based systems, ELMs emulate expert cardiac electrophysiologists by issuing diagnoses, analyzing waveform morphology, identifying contributing factors, and proposing patient-specific action plans. To realize this potential, researchers are curating instruction-tuning datasets that pair ECGs with textual dialogues and are training ELMs on these resources. Yet before scaling ELMs further, there is a fundamental question yet to be explored: What is the most effective ECG input representation? In recent works, three candidate representations have emerged-raw time-series signals, rendered images, and discretized symbolic sequences. We present the first comprehensive benchmark of these modalities across 6 public datasets and 5 evaluation metrics. We find symbolic representations achieve the greatest number of statistically significant wins over both signal and image inputs. We further ablate the LLM backbone, ECG duration, and token budget, and we evaluate robustness to signal perturbations. We hope that our findings offer clear guidance for selecting input representations when developing the next generation of ELMs.

Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework

TL;DR

This work addresses the question of which ECG input representation best suits autoregressive Electrocardiogram-Language Models (ELMs). It introduces a unified benchmark across 6 public ECG datasets and 5 evaluation metrics to compare raw signals, plotted images, and symbolic tokenizations, including the ECG-Byte-based end-to-end approach. The study finds that symbolic, tokenized representations () with End-to-End training consistently yield the most statistically significant improvements in text generation, robustness, and scaling, outperforming raw and image modalities. Ablations across LLM backbones, ECG length , and sequence length illuminate trade-offs and guidance for practical deployment, with an open-source framework released to standardize future ELM research and development.

Abstract

Recent advances have increasingly applied large language models (LLMs) to electrocardiogram (ECG) interpretation, giving rise to Electrocardiogram-Language Models (ELMs). Conditioned on an ECG and a textual query, an ELM autoregressively generates a free-form textual response. Unlike traditional classification-based systems, ELMs emulate expert cardiac electrophysiologists by issuing diagnoses, analyzing waveform morphology, identifying contributing factors, and proposing patient-specific action plans. To realize this potential, researchers are curating instruction-tuning datasets that pair ECGs with textual dialogues and are training ELMs on these resources. Yet before scaling ELMs further, there is a fundamental question yet to be explored: What is the most effective ECG input representation? In recent works, three candidate representations have emerged-raw time-series signals, rendered images, and discretized symbolic sequences. We present the first comprehensive benchmark of these modalities across 6 public datasets and 5 evaluation metrics. We find symbolic representations achieve the greatest number of statistically significant wins over both signal and image inputs. We further ablate the LLM backbone, ECG duration, and token budget, and we evaluate robustness to signal perturbations. We hope that our findings offer clear guidance for selecting input representations when developing the next generation of ELMs.

Paper Structure

This paper contains 34 sections, 6 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: A high-level overview of our training and evaluation pipeline. The input data is represented as $X = \{X_{\text{sig}}, X^*_{\text{sig}}, X_{\text{img}}, X_{\text{ID}}\}$ as seen in Step 1. Step 2 comprises two modes: Encoder training and LLM training. During Encoder training, an ECG-specific encoder is trained from scratch. In LLM training, any method used to compress the ECG into an appropriate representation for the ELM is denoted by $\mathcal{F}(*)$ for simplicity; this mode covers all training methods described in Subsection \ref{['elm_train']}. In Step 3, inference is performed using a conversational template applicable to both single- and multi-turn settings.
  • Figure 2: Spider charts for the performance of each model and training paradigm. We want to note that $X_{\text{sig}}$, $X^*_{\text{sig}}$, $X_{\text{img}}$, and $X_{\text{ID}}$ utilize 2-Stage Scratch, End-to-End LLaVA, End-to-End LLaVA, and End-to-End training paradigms respectively. We include a table representing the same results in Table \ref{['tab:main']} of Appendix \ref{['apd:results']}.