UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Zixuan Li, Jing Xiong, Fanghua Ye, Chuanyang Zheng, Xun Wu, Jianqiao Lu, Zhongwei Wan, Xiaodan Liang, Chengming Li, Zhenan Sun, Lingpeng Kong, Ngai Wong
TL;DR
UncertaintyRAG addresses the challenge of long-context retrieval in RAG by introducing a span-level uncertainty measure based on the Signal-to-Noise Ratio (SNR) to calibrate chunk similarities. It trains a robust, unsupervised retrieval model through a contrastive objective that uses span-uncertainty-derived positives and negatives, coupled with scalable data-sampling strategies across diverse datasets. Empirical results show improved performance under distribution shift and strong calibration, achieving state-of-the-art-like results with only a fraction of the data required by open-source baselines and without fine-tuning the LLM. The method provides a lightweight, plug-and-play retrieval component that can be integrated with various LLMs and context window lengths, offering a practical solution for robust long-context QA and generation tasks.
Abstract
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) that utilizes Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. This span uncertainty enhances model calibration, improving robustness and mitigating semantic inconsistencies introduced by random chunking. Leveraging this insight, we propose an efficient unsupervised learning technique to train the retrieval model, alongside an effective data sampling and scaling strategy. UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results while using only 4% of the training data compared to other advanced open-source retrieval models under distribution shift settings. Our method demonstrates strong calibration through span uncertainty, leading to improved generalization and robustness in long-context RAG tasks. Additionally, UncertaintyRAG provides a lightweight retrieval model that can be integrated into any large language model with varying context window lengths, without the need for fine-tuning, showcasing the flexibility of our approach.
