ICA-RAG: Information Completeness Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis
Jiawei He, Mingyi Jia, Zhihao Jia, Junwen Duan, Yan Song, Jianxin Wang
TL;DR
ICA-RAG tackles the inefficiency and noise of indiscriminate retrieval in disease diagnosis by introducing an information-completeness driven adaptive retrieval controller. It segments long EMR inputs into text units, labels their importance with a classifier, and computes a normalized completeness score $I_{norm}$ to determine when retrieval is needed, supplemented by a differential-diagnosis prompted knowledge filter. The retrieval stage operates at the sentence level with chunk-based reranking and document-level aggregation, followed by knowledge filtering to produce a focused knowledge set for final diagnosis generation. Experiments on three Chinese EMR datasets show ICA-RAG consistently outperforms baselines in F1 metrics and improves efficiency, highlighting its practical value for reliable, scalable clinical RAG systems.
Abstract
Retrieval-Augmented Large Language Models (LLMs), which integrate external knowledge, have shown remarkable performance in medical domains, including clinical diagnosis. However, existing RAG methods often struggle to tailor retrieval strategies to diagnostic difficulty and input sample informativeness. This limitation leads to excessive and often unnecessary retrieval, impairing computational efficiency and increasing the risk of introducing noise that can degrade diagnostic accuracy. To address this, we propose ICA-RAG (\textbf{I}nformation \textbf{C}ompleteness Guided \textbf{A}daptive \textbf{R}etrieval-\textbf{A}ugmented \textbf{G}eneration), a novel framework for enhancing RAG reliability in disease diagnosis. ICA-RAG utilizes an adaptive control module to assess the necessity of retrieval based on the input's information completeness. By optimizing retrieval and incorporating knowledge filtering, ICA-RAG better aligns retrieval operations with clinical requirements. Experiments on three Chinese electronic medical record datasets demonstrate that ICA-RAG significantly outperforms baseline methods, highlighting its effectiveness in clinical diagnosis.
