Table of Contents
Fetching ...

Generator-Retriever-Generator Approach for Open-Domain Question Answering

Abdelrahman Abdallah, Adam Jatowt

TL;DR

The paper addresses open-domain QA by integrating document generation and retrieval into a Generator-Retriever-Generator (GRG) pipeline. A first LLM generates contextual documents for a given question, while a dual-encoder retriever fetches relevant external documents; a second LLM then produces the final answer conditioned on both sources. GRG demonstrates substantial gains over state-of-the-art generate-then-read and retrieve-then-read pipelines across TriviaQA, Natural Questions, and WebQ, with notable EM improvements and evidence that jointly leveraging generated and retrieved documents enhances answer quality. The approach is accompanied by an implemented system, dataset usage, and comprehensive ablations, underscoring its potential for scalable, high-precision open-domain QA in real-world settings.

Abstract

Open-domain question answering (QA) tasks usually require the retrieval of relevant information from a large corpus to generate accurate answers. We propose a novel approach called Generator-Retriever-Generator (GRG) that combines document retrieval techniques with a large language model (LLM), by first prompting the model to generate contextual documents based on a given question. In parallel, a dual-encoder network retrieves documents that are relevant to the question from an external corpus. The generated and retrieved documents are then passed to the second LLM, which generates the final answer. By combining document retrieval and LLM generation, our approach addresses the challenges of open-domain QA, such as generating informative and contextually relevant answers. GRG outperforms the state-of-the-art generate-then-read and retrieve-then-read pipelines (GENREAD and RFiD) improving their performance by at least by +5.2, +4.2, and +1.6 on TriviaQA, NQ, and WebQ datasets, respectively. We provide code, datasets, and checkpoints at https://github.com/abdoelsayed2016/GRG.

Generator-Retriever-Generator Approach for Open-Domain Question Answering

TL;DR

The paper addresses open-domain QA by integrating document generation and retrieval into a Generator-Retriever-Generator (GRG) pipeline. A first LLM generates contextual documents for a given question, while a dual-encoder retriever fetches relevant external documents; a second LLM then produces the final answer conditioned on both sources. GRG demonstrates substantial gains over state-of-the-art generate-then-read and retrieve-then-read pipelines across TriviaQA, Natural Questions, and WebQ, with notable EM improvements and evidence that jointly leveraging generated and retrieved documents enhances answer quality. The approach is accompanied by an implemented system, dataset usage, and comprehensive ablations, underscoring its potential for scalable, high-precision open-domain QA in real-world settings.

Abstract

Open-domain question answering (QA) tasks usually require the retrieval of relevant information from a large corpus to generate accurate answers. We propose a novel approach called Generator-Retriever-Generator (GRG) that combines document retrieval techniques with a large language model (LLM), by first prompting the model to generate contextual documents based on a given question. In parallel, a dual-encoder network retrieves documents that are relevant to the question from an external corpus. The generated and retrieved documents are then passed to the second LLM, which generates the final answer. By combining document retrieval and LLM generation, our approach addresses the challenges of open-domain QA, such as generating informative and contextually relevant answers. GRG outperforms the state-of-the-art generate-then-read and retrieve-then-read pipelines (GENREAD and RFiD) improving their performance by at least by +5.2, +4.2, and +1.6 on TriviaQA, NQ, and WebQ datasets, respectively. We provide code, datasets, and checkpoints at https://github.com/abdoelsayed2016/GRG.
Paper Structure (33 sections, 4 equations, 3 figures, 12 tables)

This paper contains 33 sections, 4 equations, 3 figures, 12 tables.

Figures (3)

  • Figure 1: Simplified diagram illustrating the idea behind the Generator-Retriever-Generator approach.
  • Figure 2: Architecture diagram illustrating the Generator-Retriever-Generator (GRG) approach, which combines document retrieval techniques and large language models to generate contextual documents and retrieve relevant information for answering questions.
  • Figure 3: Performance Comparison (EM) of DPR+LLaMA and InstructGPT+LLaMA models on TQA and NQ.