Table of Contents
Fetching ...

ICA-RAG: Information Completeness Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis

Jiawei He, Mingyi Jia, Zhihao Jia, Junwen Duan, Yan Song, Jianxin Wang

TL;DR

ICA-RAG tackles the inefficiency and noise of indiscriminate retrieval in disease diagnosis by introducing an information-completeness driven adaptive retrieval controller. It segments long EMR inputs into text units, labels their importance with a classifier, and computes a normalized completeness score $I_{norm}$ to determine when retrieval is needed, supplemented by a differential-diagnosis prompted knowledge filter. The retrieval stage operates at the sentence level with chunk-based reranking and document-level aggregation, followed by knowledge filtering to produce a focused knowledge set for final diagnosis generation. Experiments on three Chinese EMR datasets show ICA-RAG consistently outperforms baselines in F1 metrics and improves efficiency, highlighting its practical value for reliable, scalable clinical RAG systems.

Abstract

Retrieval-Augmented Large Language Models (LLMs), which integrate external knowledge, have shown remarkable performance in medical domains, including clinical diagnosis. However, existing RAG methods often struggle to tailor retrieval strategies to diagnostic difficulty and input sample informativeness. This limitation leads to excessive and often unnecessary retrieval, impairing computational efficiency and increasing the risk of introducing noise that can degrade diagnostic accuracy. To address this, we propose ICA-RAG (\textbf{I}nformation \textbf{C}ompleteness Guided \textbf{A}daptive \textbf{R}etrieval-\textbf{A}ugmented \textbf{G}eneration), a novel framework for enhancing RAG reliability in disease diagnosis. ICA-RAG utilizes an adaptive control module to assess the necessity of retrieval based on the input's information completeness. By optimizing retrieval and incorporating knowledge filtering, ICA-RAG better aligns retrieval operations with clinical requirements. Experiments on three Chinese electronic medical record datasets demonstrate that ICA-RAG significantly outperforms baseline methods, highlighting its effectiveness in clinical diagnosis.

ICA-RAG: Information Completeness Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis

TL;DR

ICA-RAG tackles the inefficiency and noise of indiscriminate retrieval in disease diagnosis by introducing an information-completeness driven adaptive retrieval controller. It segments long EMR inputs into text units, labels their importance with a classifier, and computes a normalized completeness score to determine when retrieval is needed, supplemented by a differential-diagnosis prompted knowledge filter. The retrieval stage operates at the sentence level with chunk-based reranking and document-level aggregation, followed by knowledge filtering to produce a focused knowledge set for final diagnosis generation. Experiments on three Chinese EMR datasets show ICA-RAG consistently outperforms baselines in F1 metrics and improves efficiency, highlighting its practical value for reliable, scalable clinical RAG systems.

Abstract

Retrieval-Augmented Large Language Models (LLMs), which integrate external knowledge, have shown remarkable performance in medical domains, including clinical diagnosis. However, existing RAG methods often struggle to tailor retrieval strategies to diagnostic difficulty and input sample informativeness. This limitation leads to excessive and often unnecessary retrieval, impairing computational efficiency and increasing the risk of introducing noise that can degrade diagnostic accuracy. To address this, we propose ICA-RAG (\textbf{I}nformation \textbf{C}ompleteness Guided \textbf{A}daptive \textbf{R}etrieval-\textbf{A}ugmented \textbf{G}eneration), a novel framework for enhancing RAG reliability in disease diagnosis. ICA-RAG utilizes an adaptive control module to assess the necessity of retrieval based on the input's information completeness. By optimizing retrieval and incorporating knowledge filtering, ICA-RAG better aligns retrieval operations with clinical requirements. Experiments on three Chinese electronic medical record datasets demonstrate that ICA-RAG significantly outperforms baseline methods, highlighting its effectiveness in clinical diagnosis.

Paper Structure

This paper contains 16 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of three different RAG paradigms for solving clinical diagnosis task.
  • Figure 2: The overall architecture of our proposed framework ICA-RAG. It consists of three stages. Stage(a) involves inference & Retrieval Decision Making Based on Fine-Grained Information Density. Stage (b) focuses on knowledge retrieval and integration. Note that Stage (b) and (c) is activated only when the score computed in Stage (a) falls below a predefined threshold.
  • Figure 3: Details of our proposed annotation strategy. During the annotation process, we adopt different annotation strategies based on the responses generated by the LLM.
  • Figure 4: A Comparative Analysis of Computational Time Expenditure and Diagnostic Performance Between the Proposed Method and Selected Baseline Methods on the CMEMR Dataset.