Table of Contents
Fetching ...

Aligning Extraction and Generation for Robust Retrieval-Augmented Generation

Hwanjun Song, Jeonghwan Choi, Minseok Kim

TL;DR

Ext2Gen tackles robust retrieval-augmented generation by training LLMs to extract query-relevant evidence from noisy retrievals before generating answers, thereby mitigating the effects of chunk misplacement and information noisiness. The method jointly optimizes extraction and generation through preference-aligned, pairwise feedback derived from multiple LLMs and QA metrics, removing the need for pre-generation compression. Empirical results show Ext2Gen substantially improves robustness over compression-based baselines and benefits further from improved retrieval techniques, with good generalization to other backbones and deployment in real RAG pipelines. The work demonstrates that generation-side enhancements provide complementary gains beyond retrieval improvements in practical RAG systems.

Abstract

Retrieval-augmented generation (RAG) enhances LLMs with external knowledge, yet generation remains vulnerable to retrieval-induced noise and uncertain placement of relevant chunks, often causing hallucinations. We present Ext2Gen, an extract-then-generate framework that strengthens LLMs via joint evidence selection and answer generation, dynamically identifying query-relevant content while suppressing noise, thereby removing the need for any independent pre-generation compression module. Optimized through preference alignment with well-curated pairwise feedback, Ext2Gen produces accurate and faithful answers even under noisy or imprecise retrieval. Experiments demonstrate that it substantially enhances the robustness of the generation backbone and yields greater performance gains than methods relying on independent compression models, e.g., Recomp, CompAct, EXIT). It further benefits from improved retrieval techniques such as query rewriting, underscoring that generation-side enhancements address limitations that retrieval alone cannot overcome.

Aligning Extraction and Generation for Robust Retrieval-Augmented Generation

TL;DR

Ext2Gen tackles robust retrieval-augmented generation by training LLMs to extract query-relevant evidence from noisy retrievals before generating answers, thereby mitigating the effects of chunk misplacement and information noisiness. The method jointly optimizes extraction and generation through preference-aligned, pairwise feedback derived from multiple LLMs and QA metrics, removing the need for pre-generation compression. Empirical results show Ext2Gen substantially improves robustness over compression-based baselines and benefits further from improved retrieval techniques, with good generalization to other backbones and deployment in real RAG pipelines. The work demonstrates that generation-side enhancements provide complementary gains beyond retrieval improvements in practical RAG systems.

Abstract

Retrieval-augmented generation (RAG) enhances LLMs with external knowledge, yet generation remains vulnerable to retrieval-induced noise and uncertain placement of relevant chunks, often causing hallucinations. We present Ext2Gen, an extract-then-generate framework that strengthens LLMs via joint evidence selection and answer generation, dynamically identifying query-relevant content while suppressing noise, thereby removing the need for any independent pre-generation compression module. Optimized through preference alignment with well-curated pairwise feedback, Ext2Gen produces accurate and faithful answers even under noisy or imprecise retrieval. Experiments demonstrate that it substantially enhances the robustness of the generation backbone and yields greater performance gains than methods relying on independent compression models, e.g., Recomp, CompAct, EXIT). It further benefits from improved retrieval techniques such as query rewriting, underscoring that generation-side enhancements address limitations that retrieval alone cannot overcome.

Paper Structure

This paper contains 41 sections, 3 equations, 3 figures, 14 tables.

Figures (3)

  • Figure 1: Overview of Ext2Gen. We simulate noisy RAG inputs by mixing relevant and irrelevant chunks with LLM-generated queries. Multiple LLMs generate answers, and pairwise feedback is derived to train the LLM backbone for robust generation.
  • Figure 2: Robustness to (left) relevant chunk position (moving down as it shifts right) and (right) the number of added irrelevant chunks (increasing noise level to the right). Results are based on the Llama3.1-8b-instruct backbone.
  • Figure 3: Accuracy of the Llama3.1-8b backbone fine-tuned with Ext2Gen in a RAG environment, evaluated across three retrieval approaches: naive dense retrieval (Naive) and its enhanced variants using query rewriting methods, HyDEgao2023precise and MuGIzhang2024mugi.