Table of Contents
Fetching ...

CardioRAG: A Retrieval-Augmented Generation Framework for Multimodal Chagas Disease Detection

Zhengyang Shen, Xuehao Zhai, Hua Tu, Mayue Shi

TL;DR

The paper tackles Chagas disease screening in settings with limited serology by proposing CardioRAG, a retrieval-augmented генераtion framework that fuses interpretable ECG biomarkers with LLM-based diagnostic reasoning. It combines ECG feature extraction, VAE-based latent representations for case retrieval, and structured LLM outputs to deliver evidence-based diagnoses with confidence. Key findings show that a simplified prompt design and moderate, demographic-aware retrieval (k=8) yield the best balance of accuracy and recall, achieving up to $58.59%$ accuracy and $89.80%$ recall on a 100-patient test. This approach supports high-recall triage for serology in low-resource regions and demonstrates how clinical indicators can be embedded into trustworthy medical AI systems.

Abstract

Chagas disease affects nearly 6 million people worldwide, with Chagas cardiomyopathy representing its most severe complication. In regions where serological testing capacity is limited, AI-enhanced electrocardiogram (ECG) screening provides a critical diagnostic alternative. However, existing machine learning approaches face challenges such as limited accuracy, reliance on large labeled datasets, and more importantly, weak integration with evidence-based clinical diagnostic indicators. We propose a retrieval-augmented generation framework, CardioRAG, integrating large language models with interpretable ECG-based clinical features, including right bundle branch block, left anterior fascicular block, and heart rate variability metrics. The framework uses variational autoencoder-learned representations for semantic case retrieval, providing contextual cases to guide clinical reasoning. Evaluation demonstrated high recall performance of 89.80%, with a maximum F1 score of 0.68 for effective identification of positive cases requiring prioritized serological testing. CardioRAG provides an interpretable, clinical evidence-based approach particularly valuable for resource-limited settings, demonstrating a pathway for embedding clinical indicators into trustworthy medical AI systems.

CardioRAG: A Retrieval-Augmented Generation Framework for Multimodal Chagas Disease Detection

TL;DR

The paper tackles Chagas disease screening in settings with limited serology by proposing CardioRAG, a retrieval-augmented генераtion framework that fuses interpretable ECG biomarkers with LLM-based diagnostic reasoning. It combines ECG feature extraction, VAE-based latent representations for case retrieval, and structured LLM outputs to deliver evidence-based diagnoses with confidence. Key findings show that a simplified prompt design and moderate, demographic-aware retrieval (k=8) yield the best balance of accuracy and recall, achieving up to accuracy and recall on a 100-patient test. This approach supports high-recall triage for serology in low-resource regions and demonstrates how clinical indicators can be embedded into trustworthy medical AI systems.

Abstract

Chagas disease affects nearly 6 million people worldwide, with Chagas cardiomyopathy representing its most severe complication. In regions where serological testing capacity is limited, AI-enhanced electrocardiogram (ECG) screening provides a critical diagnostic alternative. However, existing machine learning approaches face challenges such as limited accuracy, reliance on large labeled datasets, and more importantly, weak integration with evidence-based clinical diagnostic indicators. We propose a retrieval-augmented generation framework, CardioRAG, integrating large language models with interpretable ECG-based clinical features, including right bundle branch block, left anterior fascicular block, and heart rate variability metrics. The framework uses variational autoencoder-learned representations for semantic case retrieval, providing contextual cases to guide clinical reasoning. Evaluation demonstrated high recall performance of 89.80%, with a maximum F1 score of 0.68 for effective identification of positive cases requiring prioritized serological testing. CardioRAG provides an interpretable, clinical evidence-based approach particularly valuable for resource-limited settings, demonstrating a pathway for embedding clinical indicators into trustworthy medical AI systems.

Paper Structure

This paper contains 9 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The CardioRAG Framework for Chagas disease diagnosis from 12-lead ECG signals. The system preprocesses raw ECG data, extracts clinical and latent features via VAE, retrieves relevant cases from a RAG database, and generates structured diagnoses with confidence scores using a large language model.
  • Figure 2: Impact of prompt engineering (top-k retrieved case, k=8). Configurations: P1 Detailed prompt (baseline, full ECG criteria and clinical instructions), P2 Simplified Clinical (without detailed ECG criteria for RBBB/LAFB), P3 Context-Free (without diagnostic background), P4 Conservative (includes cautionary guidance for positive diagnoses).