Table of Contents
Fetching ...

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales

Zhepei Wei, Wei-Lin Chen, Yu Meng

TL;DR

This work tackles the challenge of noisy retrieved content in retrieval-augmented generation by enabling explicit denoising through self-synthesized rationales. It introduces InstructRAG, a two-step framework where an instruction-following LM generates denoising rationales that justify answers, which are then used for in-context learning or supervised fine-tuning. Across five knowledge-intensive benchmarks, InstructRAG consistently outperforms state-of-the-art baselines in both training-free and trainable settings, achieving about an 8.3% relative improvement on average. The approach demonstrates robust noise handling, good generalization to out-of-domain tasks, and even transfer to code generation, supported by thorough ablations and LLM-based evaluation.

Abstract

Retrieval-augmented generation (RAG) has shown promising potential to enhance the accuracy and factuality of language models (LMs). However, imperfect retrievers or noisy corpora can introduce misleading or even erroneous information to the retrieved contents, posing a significant challenge to the generation quality. Existing RAG methods typically address this challenge by directly predicting final answers despite potentially noisy inputs, resulting in an implicit denoising process that is difficult to interpret and verify. On the other hand, the acquisition of explicit denoising supervision is often costly, involving significant human efforts. In this work, we propose InstructRAG, where LMs explicitly learn the denoising process through self-synthesized rationales -- First, we instruct the LM to explain how the ground-truth answer is derived from retrieved documents. Then, these rationales can be used either as demonstrations for in-context learning of explicit denoising or as supervised fine-tuning data to train the model. Compared to standard RAG approaches, InstructRAG requires no additional supervision, allows for easier verification of the predicted answers, and effectively improves generation accuracy. Experiments show InstructRAG consistently outperforms existing RAG methods in both training-free and trainable scenarios, achieving a relative improvement of 8.3% over the best baseline method on average across five knowledge-intensive benchmarks. Extensive analysis indicates that InstructRAG scales well with increased numbers of retrieved documents and consistently exhibits robust denoising ability even in out-of-domain datasets, demonstrating strong generalizability.

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales

TL;DR

This work tackles the challenge of noisy retrieved content in retrieval-augmented generation by enabling explicit denoising through self-synthesized rationales. It introduces InstructRAG, a two-step framework where an instruction-following LM generates denoising rationales that justify answers, which are then used for in-context learning or supervised fine-tuning. Across five knowledge-intensive benchmarks, InstructRAG consistently outperforms state-of-the-art baselines in both training-free and trainable settings, achieving about an 8.3% relative improvement on average. The approach demonstrates robust noise handling, good generalization to out-of-domain tasks, and even transfer to code generation, supported by thorough ablations and LLM-based evaluation.

Abstract

Retrieval-augmented generation (RAG) has shown promising potential to enhance the accuracy and factuality of language models (LMs). However, imperfect retrievers or noisy corpora can introduce misleading or even erroneous information to the retrieved contents, posing a significant challenge to the generation quality. Existing RAG methods typically address this challenge by directly predicting final answers despite potentially noisy inputs, resulting in an implicit denoising process that is difficult to interpret and verify. On the other hand, the acquisition of explicit denoising supervision is often costly, involving significant human efforts. In this work, we propose InstructRAG, where LMs explicitly learn the denoising process through self-synthesized rationales -- First, we instruct the LM to explain how the ground-truth answer is derived from retrieved documents. Then, these rationales can be used either as demonstrations for in-context learning of explicit denoising or as supervised fine-tuning data to train the model. Compared to standard RAG approaches, InstructRAG requires no additional supervision, allows for easier verification of the predicted answers, and effectively improves generation accuracy. Experiments show InstructRAG consistently outperforms existing RAG methods in both training-free and trainable scenarios, achieving a relative improvement of 8.3% over the best baseline method on average across five knowledge-intensive benchmarks. Extensive analysis indicates that InstructRAG scales well with increased numbers of retrieved documents and consistently exhibits robust denoising ability even in out-of-domain datasets, demonstrating strong generalizability.
Paper Structure (21 sections, 1 equation, 7 figures, 12 tables, 1 algorithm)

This paper contains 21 sections, 1 equation, 7 figures, 12 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison between vanilla RAG and our InstructRAG. In vanilla RAG, the model is tasked to directly predict answers given user queries and potentially noisy retrieved documents, without explicit denoising processes or explanations for how the answer is derived. In contrast, our proposed InstructRAG generates rationales that explicitly denoise the retrieved documents and justify the predicted answers, enhancing both the generation accuracy and trustworthiness.
  • Figure 2: An overview of InstructRAG. In step one, given the question $q$, retrieved documents $\{d_1, \cdots, d_K\}$ and ground-truth answer $a$ from the training set, we prompt an instruction-tuned LM ( i.e., rationale generator $\mathcal{M}_\phi$) to generate rationale $r$ that explains how the answer can be derived from the potentially noisy input. In step two, we utilize the synthesized rationales from the first step to guide the LM ( i.e., rationale learner $\mathcal{M}_\theta$) to explicitly learn denoising of the retrieved documents, either through in-context learning or supervised learning. By default, we use the same model for both $\mathcal{M}_\phi$ and $\mathcal{M}_\theta$, but they can be instantiated with different models as well (see ablation study § \ref{['sec:ablation_study']}).
  • Figure 3: Impact of different number of demonstrations and retrieved documents. (a) Demonstration sensitivity study of InstructRAG-ICL. (b) Noise robustness study of InstructRAG-ICL. (c) Noise robustness study of InstructRAG-FT.
  • Figure 4: Generalizing InstructRAG from source domain task to target domain task, where ID and OOD denote in-domain and out-of-domain settings. (a) PopQA (short-form QA task) as source domain and ASQA (long-form QA task) as target domain. (b) ASQA as source domain and PopQA as target domain. (c) PopQA (single-hop QA task) as source domain and 2WikiMultiHopQA (multi-hop QA task) as target domain. We adopt few-shot demonstration with instruction and vanilla supervised fine-tuning as the training-free and trainable baselines.
  • Figure 5: Visualization of model attention from answer to retrieved documents on a random sample from the ASQA task, where Doc 2 is the only relevant document that contains the correct answer.
  • ...and 2 more figures