Generator-Retriever-Generator Approach for Open-Domain Question Answering

Abdelrahman Abdallah; Adam Jatowt

Generator-Retriever-Generator Approach for Open-Domain Question Answering

Abdelrahman Abdallah, Adam Jatowt

TL;DR

The paper addresses open-domain QA by integrating document generation and retrieval into a Generator-Retriever-Generator (GRG) pipeline. A first LLM generates contextual documents for a given question, while a dual-encoder retriever fetches relevant external documents; a second LLM then produces the final answer conditioned on both sources. GRG demonstrates substantial gains over state-of-the-art generate-then-read and retrieve-then-read pipelines across TriviaQA, Natural Questions, and WebQ, with notable EM improvements and evidence that jointly leveraging generated and retrieved documents enhances answer quality. The approach is accompanied by an implemented system, dataset usage, and comprehensive ablations, underscoring its potential for scalable, high-precision open-domain QA in real-world settings.

Abstract

Open-domain question answering (QA) tasks usually require the retrieval of relevant information from a large corpus to generate accurate answers. We propose a novel approach called Generator-Retriever-Generator (GRG) that combines document retrieval techniques with a large language model (LLM), by first prompting the model to generate contextual documents based on a given question. In parallel, a dual-encoder network retrieves documents that are relevant to the question from an external corpus. The generated and retrieved documents are then passed to the second LLM, which generates the final answer. By combining document retrieval and LLM generation, our approach addresses the challenges of open-domain QA, such as generating informative and contextually relevant answers. GRG outperforms the state-of-the-art generate-then-read and retrieve-then-read pipelines (GENREAD and RFiD) improving their performance by at least by +5.2, +4.2, and +1.6 on TriviaQA, NQ, and WebQ datasets, respectively. We provide code, datasets, and checkpoints at https://github.com/abdoelsayed2016/GRG.

Generator-Retriever-Generator Approach for Open-Domain Question Answering

TL;DR

Abstract

Paper Structure (33 sections, 4 equations, 3 figures, 12 tables)

This paper contains 33 sections, 4 equations, 3 figures, 12 tables.

Introduction
Related Work
Retriever Reader
Retriever Generator
Generator Reader
Retriever Only
Method
Document Generation
Vector Index Retrieval
Document Retriever
Generation Model
Experimental Settings
Datasets
Choice of Document Number
Experimental Setup
...and 18 more sections

Figures (3)

Figure 1: Simplified diagram illustrating the idea behind the Generator-Retriever-Generator approach.
Figure 2: Architecture diagram illustrating the Generator-Retriever-Generator (GRG) approach, which combines document retrieval techniques and large language models to generate contextual documents and retrieve relevant information for answering questions.
Figure 3: Performance Comparison (EM) of DPR+LLaMA and InstructGPT+LLaMA models on TQA and NQ.

Generator-Retriever-Generator Approach for Open-Domain Question Answering

TL;DR

Abstract

Generator-Retriever-Generator Approach for Open-Domain Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (3)