Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation
Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, Muhao Chen
TL;DR
FaviComp tackles the challenge of integrating multiple retrieved evidence in retrieval-augmented generation by introducing a training-free, inference-time compression that makes evidence more familiar to the target LM. It achieves this via ensemble decoding that blends the compression model and the target model's token probabilities, thereby reducing the target model's perplexity on the compressed context while incorporating its parametric knowledge. Across five open-domain QA datasets, FaviComp outperforms most baselines and even surpasses Gold Compression on at least one multi-document dataset, with optimal performance near an ensemble weight of $\alpha=0.5$. The method is model- and prompt-agnostic, scalable to various RAG pipelines, and demonstrates substantial improvements in accuracy with high compression rates, highlighting practical impact for knowledge-intensive tasks.
Abstract
Retrieval-augmented generation (RAG) improves large language models (LMs) by incorporating non-parametric knowledge through evidence retrieved from external sources. However, it often struggles to cope with inconsistent and irrelevant information that can distract the LM from its tasks, especially when multiple evidence pieces are required. While compressing the retrieved evidence with a compression model aims to address this issue, the compressed evidence may still be unfamiliar to the target model used for downstream tasks, potentially failing to utilize the evidence effectively. We propose FaviComp (Familarity-Aware Evidence Compression), a novel training-free evidence compression technique that makes retrieved evidence more familiar to the target model, while seamlessly integrating parametric knowledge from the model. Experimental results show that FaviComp consistently outperforms most recent evidence compression baselines across multiple open-domain QA datasets, improving accuracy by up to 28.1% while achieving high compression rates. Additionally, we demonstrate the effective integration of both parametric and non-parametric knowledge during evidence compression.
