Table of Contents
Fetching ...

Better RAG using Relevant Information Gain

Marc Pickett, Jeremy Hartman, Ayan Kumar Bhowmick, Raquib-ul Alam, Aditya Vempaty

TL;DR

The paper tackles the memory bottleneck of retrieval-augmented generation by proposing a principled metric, relevant information gain, and a corresponding Dartboard retrieval algorithm that greedily selects $k$ passages to maximize this metric. By avoiding explicit diversity terms and instead optimizing total information gain $s(G,q,A,\sigma)$, diversity emerges naturally and yields state-of-the-art performance on the RGB benchmark for both retrieval quality and end-to-end QA. The approach generalizes traditional retrieval methods like KNN and MMR, and demonstrates robust performance across tasks with parameter settings that balance relevance and diversity. The findings suggest practical impact for improving RAG systems with more informative and diverse retrieved passages, while also outlining limitations and directions for future work, such as runtime considerations and broader benchmarking.

Abstract

A common way to extend the memory of large language models (LLMs) is by retrieval augmented generation (RAG), which inserts text retrieved from a larger memory into an LLM's context window. However, the context window is typically limited to several thousand tokens, which limits the number of retrieved passages that can inform a model's response. For this reason, it's important to avoid occupying context window space with redundant information by ensuring a degree of diversity among retrieved passages. At the same time, the information should also be relevant to the current task. Most prior methods that encourage diversity among retrieved results, such as Maximal Marginal Relevance (MMR), do so by incorporating an objective that explicitly trades off diversity and relevance. We propose a novel simple optimization metric based on relevant information gain, a probabilistic measure of the total information relevant to a query for a set of retrieved results. By optimizing this metric, diversity organically emerges from our system. When used as a drop-in replacement for the retrieval component of a RAG system, this method yields state-of-the-art performance on question answering tasks from the Retrieval Augmented Generation Benchmark (RGB), outperforming existing metrics that directly optimize for relevance and diversity.

Better RAG using Relevant Information Gain

TL;DR

The paper tackles the memory bottleneck of retrieval-augmented generation by proposing a principled metric, relevant information gain, and a corresponding Dartboard retrieval algorithm that greedily selects passages to maximize this metric. By avoiding explicit diversity terms and instead optimizing total information gain , diversity emerges naturally and yields state-of-the-art performance on the RGB benchmark for both retrieval quality and end-to-end QA. The approach generalizes traditional retrieval methods like KNN and MMR, and demonstrates robust performance across tasks with parameter settings that balance relevance and diversity. The findings suggest practical impact for improving RAG systems with more informative and diverse retrieved passages, while also outlining limitations and directions for future work, such as runtime considerations and broader benchmarking.

Abstract

A common way to extend the memory of large language models (LLMs) is by retrieval augmented generation (RAG), which inserts text retrieved from a larger memory into an LLM's context window. However, the context window is typically limited to several thousand tokens, which limits the number of retrieved passages that can inform a model's response. For this reason, it's important to avoid occupying context window space with redundant information by ensuring a degree of diversity among retrieved passages. At the same time, the information should also be relevant to the current task. Most prior methods that encourage diversity among retrieved results, such as Maximal Marginal Relevance (MMR), do so by incorporating an objective that explicitly trades off diversity and relevance. We propose a novel simple optimization metric based on relevant information gain, a probabilistic measure of the total information relevant to a query for a set of retrieved results. By optimizing this metric, diversity organically emerges from our system. When used as a drop-in replacement for the retrieval component of a RAG system, this method yields state-of-the-art performance on question answering tasks from the Retrieval Augmented Generation Benchmark (RGB), outperforming existing metrics that directly optimize for relevance and diversity.
Paper Structure (17 sections, 3 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 3 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: A visualization of Dartboard. The query is represented by the red star. All points are represented by blue dots. The five dots highlighted by grey background are the query's 5 nearest neighbors, while the dots circled in green are the five points selected by the Dartboard algorithm (numbered in the order selected by the greedy algorithm). The concentric red circles are spaced at multiples of $\sigma$, which represents the standard deviation of our uncertainty for the query's accuracy. Note the possible redundancy by naive k-nearest-neighbors, which ignores points above or to the right of the query.
  • Figure 2: Performance on end-to-end QA task (simple) as parameters vary. For Dartboard, we show its performance as $\sigma$ varies. For MMR, we show its performance as the diversity parameter varies.
  • Figure 3: We show the diversity in the set of retrieved passages from RGB for both Dartboard and MMR (for $k=5$), where diversity is one minus the average cosine similarity between pairs of retrieved passages. For both MMR and Dartboard, diversity increases as the value of the parameters ($\sigma$ and diversity for Dartboard and MMR respectively) increases.
  • Figure 4: Scatter plot of NDCG score and final end-to-end performance on the QA task. The best performing methods are in the upper right hand side of the plot.