Table of Contents
Fetching ...

Retrieval-Augmented Generation for Electrocardiogram-Language Models

Xiaoyu Song, William Han, Tony Chen, Chaojing Duan, Michael A. Rosenberg, Emerson Liu, Ding Zhao

TL;DR

This work introduces a first open-source Retrieval-Augmented Generation pipeline for Electrocardiogram-Language Models (ELMs), grounding NLG outputs in retrieved ECG data and diagnostics. The framework combines domain-specific ECG preprocessing, a RAG database of signals, features, and reports built with FAISS, and an ECG-Byte/Llama-based language model trained with an autoregressive objective that conditions on retrieved content. Across three public ECG datasets, RAG-enhanced ELMs achieve substantial gains in BLEU-4 and related metrics, with ablations clarifying design choices: RAG is most effective when used during both training and inference, smaller top-$k$ retrieval can outperform larger ones, retrieval content placement is flexible, and retrieval accuracy is critical. The open-source release and systematic ablations provide a reproducible, practical foundation for future RAG-enabled ECG interpretation and dialogue systems.

Abstract

Interest in generative Electrocardiogram-Language Models (ELMs) is growing, as they can produce textual responses conditioned on ECG signals and textual queries. Unlike traditional classifiers that output label probabilities, ELMs are more versatile, supporting domain-specific tasks (e.g., waveform analysis, diagnosis, prognosis) as well as general tasks (e.g., open-ended questions, dialogue). Retrieval-Augmented Generation (RAG), widely used in Large Language Models (LLMs) to ground LLM outputs in retrieved knowledge, helps reduce hallucinations and improve natural language generation (NLG). However, despite its promise, no open-source implementation or systematic study of RAG pipeline design for ELMs currently exists. To address this gap, we present the first open-source RAG pipeline for ELMs, along with baselines and ablation studies for NLG. Experiments on three public datasets show that ELMs with RAG consistently improves performance over non-RAG baselines and highlights key ELM design considerations. Our code is available at: https://github.com/willxxy/ECG-Bench.

Retrieval-Augmented Generation for Electrocardiogram-Language Models

TL;DR

This work introduces a first open-source Retrieval-Augmented Generation pipeline for Electrocardiogram-Language Models (ELMs), grounding NLG outputs in retrieved ECG data and diagnostics. The framework combines domain-specific ECG preprocessing, a RAG database of signals, features, and reports built with FAISS, and an ECG-Byte/Llama-based language model trained with an autoregressive objective that conditions on retrieved content. Across three public ECG datasets, RAG-enhanced ELMs achieve substantial gains in BLEU-4 and related metrics, with ablations clarifying design choices: RAG is most effective when used during both training and inference, smaller top- retrieval can outperform larger ones, retrieval content placement is flexible, and retrieval accuracy is critical. The open-source release and systematic ablations provide a reproducible, practical foundation for future RAG-enabled ECG interpretation and dialogue systems.

Abstract

Interest in generative Electrocardiogram-Language Models (ELMs) is growing, as they can produce textual responses conditioned on ECG signals and textual queries. Unlike traditional classifiers that output label probabilities, ELMs are more versatile, supporting domain-specific tasks (e.g., waveform analysis, diagnosis, prognosis) as well as general tasks (e.g., open-ended questions, dialogue). Retrieval-Augmented Generation (RAG), widely used in Large Language Models (LLMs) to ground LLM outputs in retrieved knowledge, helps reduce hallucinations and improve natural language generation (NLG). However, despite its promise, no open-source implementation or systematic study of RAG pipeline design for ELMs currently exists. To address this gap, we present the first open-source RAG pipeline for ELMs, along with baselines and ablation studies for NLG. Experiments on three public datasets show that ELMs with RAG consistently improves performance over non-RAG baselines and highlights key ELM design considerations. Our code is available at: https://github.com/willxxy/ECG-Bench.

Paper Structure

This paper contains 12 sections, 2 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Our RAG pipeline operates as follows: given an input ECG, we optionally extract features and query a RAG database of ECG signals, features, and diagnostic reports. We retrieve the top-k similar diagnostic reports, construct a prompt (system prompt + retrieved diagnostic reports + ECG tokens + query), and use it to condition the ELM to generate the response.
  • Figure 2: Our RAG pipeline demonstrates flexibility across multiple ELM architectures while consistently improving BLEU-4 and accuracy.