Table of Contents
Fetching ...

Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation

Sha Li, Naren Ramakrishnan

TL;DR

This paper tackles the unreliability and inefficiency of vanilla retrieval-augmented generation by introducing Oreo, a plug-in context reconstructor that refines retrieved chunks into concise, query-focused context. Oreo employs a retrieve-reconstruct-then-generate pipeline trained in three stages—supervised fine-tuning, contrastive multitask learning, and reinforcement learning alignment—to align reconstructed context with generator needs. It demonstrates consistent gains on single- and multi-hop open-domain QA tasks, while substantially reducing input length and latency and showing robustness to noise and order perturbations. The work advances practical RAG systems by enabling seamless integration with existing retrievers and generators, with strong implications for scalable, factual QA in real-world settings.

Abstract

Retrieval-Augmented Generation (RAG) aims to augment the capabilities of Large Language Models (LLMs) by retrieving and incorporate external documents or chunks prior to generation. However, even improved retriever relevance can brings erroneous or contextually distracting information, undermining the effectiveness of RAG in downstream tasks. We introduce a compact, efficient, and pluggable module designed to refine retrieved chunks before using them for generation. The module aims to extract and reorganize the most relevant and supportive information into a concise, query-specific format. Through a three-stage training paradigm - comprising supervised fine - tuning, contrastive multi-task learning, and reinforcement learning-based alignment - it prioritizes critical knowledge and aligns it with the generator's preferences. This approach enables LLMs to produce outputs that are more accurate, reliable, and contextually appropriate.

Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation

TL;DR

This paper tackles the unreliability and inefficiency of vanilla retrieval-augmented generation by introducing Oreo, a plug-in context reconstructor that refines retrieved chunks into concise, query-focused context. Oreo employs a retrieve-reconstruct-then-generate pipeline trained in three stages—supervised fine-tuning, contrastive multitask learning, and reinforcement learning alignment—to align reconstructed context with generator needs. It demonstrates consistent gains on single- and multi-hop open-domain QA tasks, while substantially reducing input length and latency and showing robustness to noise and order perturbations. The work advances practical RAG systems by enabling seamless integration with existing retrievers and generators, with strong implications for scalable, factual QA in real-world settings.

Abstract

Retrieval-Augmented Generation (RAG) aims to augment the capabilities of Large Language Models (LLMs) by retrieving and incorporate external documents or chunks prior to generation. However, even improved retriever relevance can brings erroneous or contextually distracting information, undermining the effectiveness of RAG in downstream tasks. We introduce a compact, efficient, and pluggable module designed to refine retrieved chunks before using them for generation. The module aims to extract and reorganize the most relevant and supportive information into a concise, query-specific format. Through a three-stage training paradigm - comprising supervised fine - tuning, contrastive multi-task learning, and reinforcement learning-based alignment - it prioritizes critical knowledge and aligns it with the generator's preferences. This approach enables LLMs to produce outputs that are more accurate, reliable, and contextually appropriate.

Paper Structure

This paper contains 32 sections, 5 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: An example comparing vanilla RAG versus RAG with Oreohighlights the impact of redundant and scattered information within the retrieved document chunks. In the vanilla RAG setup, even though the retrieved chunks contain contextually relevant information to the query, the presence of distractions and redundancy misleads the downstream LLM, causing it to misinterpret temporal dependencies and generate an incorrect answer. In contrast, Oreo effectively captures the essential evidence and reconstructs the context, leading to accurate and correct responses.
  • Figure 2: The framework of Oreo . (a) outlines the process of data collection and curation (top). (b) demonstrates the three-stage training, which comprises the supervised fine-tuning (SFT), contrastive multi-task learning (CML) and reinforcement learning (RL) alignment (middle). (c) illustrates the application of Oreo , comparing against the vanilla RAG (bottom).
  • Figure 3: Performance on five datasets by using query without retrieval, original full concatenation of chunks, passage-level filtering, context generated by Oreo with and without RL. 2WQA_k represents retrieving top-k documents for the 2WQA dataset. The downstream generator is Flan-T5. Performance of PopQA, NQ and TriviaQA are measured by Exact Match and HotpotQA and 2WQA are measured by unigram F1.
  • Figure 4: Performance comparison with 95% confidence intervals against baselines using OPT-IML as the generator. Specifically, Passage denotes passage-level filtering, CXMI refers to filtering guided by conditional cross-mutual information, and Full represents the use of original content without any filtering. PopQA, NQ, and TriviaQA are evaluated with Exact Match scores, while HotpotQA and 2WQA use Unigram F1 for accuracy measurement
  • Figure 5: Left (a) - Comparison of number of input tokens for generator and QA performance across different context types. Right (b) - Comparison of end-to-end inference time (measured in seconds) by using different types of context.
  • ...and 3 more figures