Table of Contents
Fetching ...

CAMEL: An ECG Language Model for Forecasting Cardiac Events

Neelay Velingker, Alaia Solko-Breslin, Mayank Keoliya, Seewon Choi, Jiayi Xin, Anika Marathe, Alireza Oraii, Rajat Deo, Sameed Khatana, Rajeev Alur, Mayur Naik, Eric Wong

TL;DR

CAMEL is the first ELM that is capable of inference over longer signal durations which enables its forecasting capability, and is on par with or surpasses ELMs and fully supervised baselines both in- and out-of-distribution.

Abstract

Electrocardiograms (ECG) are electrical recordings of the heart that are critical for diagnosing cardiovascular conditions. ECG language models (ELMs) have recently emerged as a promising framework for ECG classification accompanied by report generation. However, current models cannot forecast future cardiac events despite the immense clinical value for planning earlier intervention. To address this gap, we propose CAMEL, the first ELM that is capable of inference over longer signal durations which enables its forecasting capability. Our key insight is a specialized ECG encoder which enables cross-understanding of ECG signals with text. We train CAMEL using established LLM training procedures, combining LoRA adaptation with a curriculum learning pipeline. Our curriculum includes ECG classification, metrics calculations, and multi-turn conversations to elicit reasoning. CAMEL demonstrates strong zero-shot performance across 6 tasks and 9 datasets, including ECGForecastBench, a new benchmark that we introduce for forecasting arrhythmias. CAMEL is on par with or surpasses ELMs and fully supervised baselines both in- and out-of-distribution, achieving SOTA results on ECGBench (+7.0% absolute average gain) as well as ECGForecastBench (+12.4% over fully supervised models and +21.1% over zero-shot ELMs).

CAMEL: An ECG Language Model for Forecasting Cardiac Events

TL;DR

CAMEL is the first ELM that is capable of inference over longer signal durations which enables its forecasting capability, and is on par with or surpasses ELMs and fully supervised baselines both in- and out-of-distribution.

Abstract

Electrocardiograms (ECG) are electrical recordings of the heart that are critical for diagnosing cardiovascular conditions. ECG language models (ELMs) have recently emerged as a promising framework for ECG classification accompanied by report generation. However, current models cannot forecast future cardiac events despite the immense clinical value for planning earlier intervention. To address this gap, we propose CAMEL, the first ELM that is capable of inference over longer signal durations which enables its forecasting capability. Our key insight is a specialized ECG encoder which enables cross-understanding of ECG signals with text. We train CAMEL using established LLM training procedures, combining LoRA adaptation with a curriculum learning pipeline. Our curriculum includes ECG classification, metrics calculations, and multi-turn conversations to elicit reasoning. CAMEL demonstrates strong zero-shot performance across 6 tasks and 9 datasets, including ECGForecastBench, a new benchmark that we introduce for forecasting arrhythmias. CAMEL is on par with or surpasses ELMs and fully supervised baselines both in- and out-of-distribution, achieving SOTA results on ECGBench (+7.0% absolute average gain) as well as ECGForecastBench (+12.4% over fully supervised models and +21.1% over zero-shot ELMs).
Paper Structure (64 sections, 5 equations, 8 figures, 13 tables)

This paper contains 64 sections, 5 equations, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Example of CAMEL's forecasting capability. In the top example, CAMEL takes as input normal sinus rhythm ECG at time $T$ and correctly forecasts AFIB at $T+3$ minutes by reasoning over the RMSSD, RR-interval, and PAC count (reasoning highlighted). In the bottom example, CAMEL correctly predicts a normal outcome based on accurately extracted statistics.
  • Figure 2: An overview of the CAMEL architecture. 1-second, single-lead ECG segments from two patients are encoded and combined with text token embeddings. The resulting sequence is processed by an LLM backbone (MedGemma-4B with LoRA adapters) to generate a clinical report. Fixed models are shown in blue, and trainable models are shown in green.
  • Figure 3: Forecasting performance (Macro-F1) in predicting AFib, AFlutter or Sinus Rhythm across input window $w$$(s)$ and horizon $h$$(s)$. We report zero-shot results from GPT-5.2 (with Code Interpreter and high effort), PULSE, GEM, and CAMEL, supervised training results for XGB and CNN, and linear probing results for CAMEL Probe. PULSE and GEM only support 10-second ECG inputs ($w = 10$). CAMEL outperforms all baselines, with increased performance with higher input windows, highlighting the importance of supporting longer ECG recordings.
  • Figure 4: Example report generated by CAMEL for a sample from the PTB-XL dataset. The ground truth label is Sinus rhythm. Incomplete right bundle branch block. PR interval is at the upper limit. Otherwise, normal ECG.
  • Figure 5: Example of a report generated by CAMEL for a sample from the MIMIC-IV dataset.
  • ...and 3 more figures