Table of Contents
Fetching ...

From RAG to RICHES: Retrieval Interlaced with Sequence Generation

Palak Jain, Livio Baldini Soares, Tom Kwiatkowski

TL;DR

Riches presents a unified approach to retrieval-grounded generation by interleaving retrieval with sequence generation in a single decoding pass. It builds constrained beam decoding atop an FM-index-gated search and uses a propositional indexing strategy to ground outputs in a corpus, enabling attributed and multi-hop QA without additional training. The method is evaluated on open-domain QA benchmarks, showing competitive single-hop performance and strong multi-hop gains in a single pass, with larger instruction-tuned models further improving results. This work highlights the practicality of prompting-based, end-to-end retrieval-generation systems and outlines the tradeoffs, limitations, and ethical considerations of such an approach.

Abstract

We present RICHES, a novel approach that interleaves retrieval with sequence generation tasks. RICHES offers an alternative to conventional RAG systems by eliminating the need for separate retriever and generator. It retrieves documents by directly decoding their contents, constrained on the corpus. Unifying retrieval with generation allows us to adapt to diverse new tasks via prompting alone. RICHES can work with any Instruction-tuned model, without additional training. It provides attributed evidence, supports multi-hop retrievals and interleaves thoughts to plan on what to retrieve next, all within a single decoding pass of the LLM. We demonstrate the strong performance of RICHES across ODQA tasks including attributed and multi-hop QA.

From RAG to RICHES: Retrieval Interlaced with Sequence Generation

TL;DR

Riches presents a unified approach to retrieval-grounded generation by interleaving retrieval with sequence generation in a single decoding pass. It builds constrained beam decoding atop an FM-index-gated search and uses a propositional indexing strategy to ground outputs in a corpus, enabling attributed and multi-hop QA without additional training. The method is evaluated on open-domain QA benchmarks, showing competitive single-hop performance and strong multi-hop gains in a single pass, with larger instruction-tuned models further improving results. This work highlights the practicality of prompting-based, end-to-end retrieval-generation systems and outlines the tradeoffs, limitations, and ethical considerations of such an approach.

Abstract

We present RICHES, a novel approach that interleaves retrieval with sequence generation tasks. RICHES offers an alternative to conventional RAG systems by eliminating the need for separate retriever and generator. It retrieves documents by directly decoding their contents, constrained on the corpus. Unifying retrieval with generation allows us to adapt to diverse new tasks via prompting alone. RICHES can work with any Instruction-tuned model, without additional training. It provides attributed evidence, supports multi-hop retrievals and interleaves thoughts to plan on what to retrieve next, all within a single decoding pass of the LLM. We demonstrate the strong performance of RICHES across ODQA tasks including attributed and multi-hop QA.
Paper Structure (53 sections, 2 equations, 3 figures, 15 tables)

This paper contains 53 sections, 2 equations, 3 figures, 15 tables.

Figures (3)

  • Figure 1: Example Riches outputs for multi-hop queries with a single LLM and decoding pass. The green quoted text is "retrieved" or generated verbatim from the retrieval corpus. Riches generation natively interleaves thoughts and multiple retrieval evidences.
  • Figure 2: Visualization of constrained beam for query: "when did marathon change its name to snickers?". The final Riches output is "Marathon was renamed Snickers in 1990". Bold boxes track the progress of the top-beam sequence. Grey crossed out boxes are sequences that the LLM preferred, but were blocked by corpus constraints.
  • Figure 3: Illustration of the constrained decoding process. Given prefix, "Joker is played by", the continuation "Nolan" is not found in the corpus and therefore masked out.