After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in Retrieval-Augmented Generation
Xinbang Dai, Huikang Hu, Yuncheng Hua, Jiaqi Li, Yongrui Chen, Rihui Jin, Nan Hu, Guilin Qi
TL;DR
This work tackles trustworthiness in retrieval-augmented generation by focusing on the pre-generation balance between internal and retrieved knowledge. It introduces TRD, a 36,266-question benchmark spanning four RAG scenarios, and BRIDGE, a unified framework with a Soft Bias Allocator, a Maximum Soft-bias Decision Tree, and a reflection mechanism to dynamically select response strategies. Across TRD and real-world-like benchmarks, BRIDGE achieves 5-15% higher accuracy than baselines and maintains balanced performance across FA, FI, FE, and RA, including strong refusal behavior in uncertain cases. The approach offers a practical, interpretable solution for trustworthy RAG in real-world applications, with broad implications for safety-aware information retrieval and generation.
Abstract
Retrieval-augmented generation (RAG) is a promising paradigm, yet its trustworthiness remains a critical concern. A major vulnerability arises prior to generation: models often fail to balance parametric (internal) and retrieved (external) knowledge, particularly when the two sources conflict or are unreliable. To analyze these scenarios comprehensively, we construct the Trustworthiness Response Dataset (TRD) with 36,266 questions spanning four RAG settings. We reveal that existing approaches address isolated scenarios-prioritizing one knowledge source, naively merging both, or refusing answers-but lack a unified framework to handle different real-world conditions simultaneously. Therefore, we propose the BRIDGE framework, which dynamically determines a comprehensive response strategy of large language models (LLMs). BRIDGE leverages an adaptive weighting mechanism named soft bias to guide knowledge collection, followed by a Maximum Soft-bias Decision Tree to evaluate knowledge and select optimal response strategies (trust internal/external knowledge, or refuse). Experiments show BRIDGE outperforms baselines by 5-15% in accuracy while maintaining balanced performance across all scenarios. Our work provides an effective solution for LLMs' trustworthy responses in real-world RAG applications.
