Retrieval-Augmented Generation with Estimation of Source Reliability
Jeongyeon Hwang, Junyoung Park, Hyejin Park, Dongwoo Kim, Sangdon Park, Jungseul Ok
TL;DR
This work addresses factual accuracy in Retrieval-Augmented Generation when sources exhibit heterogeneous reliability. It introduces Reliability-Aware RAG (RA-RAG), which performs iterative reliability estimation across $N$ sources to assign weights $v_i$ and employs a $\kappa$-Reliable and Relevant Source Selection ($\kappa$-RRSS) to retrieve from a small, trustworthy subset, followed by WMV-based aggregation. Key innovations include automated reliability estimation without manual fact-checking via cross-source queries and a filtering plus semantic clustering pipeline (AlignScore and $\mathcal{C}(\cdot)$) to ground and combine per-source outputs efficiently. Empirical results on synthetic benchmarks and real-world sources show that RA-RAG consistently surpasses baselines, closely approaching Oracle WMV as the source pool grows, and achieves strong reliability correlations (e.g., PCC and SRCC near 0.99) while reducing inference costs through $\kappa$-RRSS. This approach enhances the practicality of RAG systems for real-world knowledge bases by delivering more grounded, trustworthy answers with scalable retrieval.
Abstract
Retrieval-Augmented Generation (RAG) is an effective approach to enhance the factual accuracy of large language models (LLMs) by retrieving information from external databases, which are typically composed of diverse sources, to supplement the limited internal knowledge of LLMs. However, the standard RAG often risks retrieving incorrect information, as it relies solely on relevance between a query and a document, overlooking the heterogeneous reliability of these sources. To address this issue, we propose Reliability-Aware RAG (RA-RAG), a new multi-source RAG framework that estimates the reliability of sources and leverages this information to prioritize highly reliable and relevant documents, ensuring more robust and accurate response generation. Specifically, RA-RAG first estimates source reliability by cross-checking information across multiple sources. It then retrieves documents from the top-$κ$ reliable and relevant sources and aggregates their information using weighted majority voting (WMV), where the selective retrieval ensures scalability while not compromising the performance. Comprehensive experiments show that RA-RAG consistently outperforms baselines in scenarios with heterogeneous source reliability while scaling efficiently as the number of sources increases. Furthermore, we demonstrate the ability of RA-RAG to estimate real-world sources' reliability, highlighting its practical applicability. \jy{Our code and data are available at \href{https://github.com/ml-postech/RA-RAG}{RA-RAG}.}
