Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

Hansi Zeng; Chen Luo; Hamed Zamani

Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

Hansi Zeng, Chen Luo, Hamed Zamani

TL;DR

This work tackles the bottleneck of generative retrieval arising from local, prefix-based pruning in constrained beam search. It introduces Planning-Ahead in Generative Retrieval (PAG), a framework that combines a set-based, lexical DocID with a sequential, semantic DocID implemented via residual quantization and joint optimization, guided by simultaneous decoding. PAG demonstrates large improvements over the prior state-of-the-art on MSMARCO and TREC-DL benchmarks, while delivering substantial latency reductions and memory efficiency, enabling scalable retrieval. The approach offers a practical pathway for incorporating document-level scoring into autoregressive generation and holds potential for broader knowledge-intensive tasks beyond passage retrieval.

Abstract

This paper introduces PAG-a novel optimization and decoding approach that guides autoregressive generation of document identifiers in generative retrieval models through simultaneous decoding. To this aim, PAG constructs a set-based and sequential identifier for each document. Motivated by the bag-of-words assumption in information retrieval, the set-based identifier is built on lexical tokens. The sequential identifier, on the other hand, is obtained via quantizing relevance-based representations of documents. Extensive experiments on MSMARCO and TREC Deep Learning Track data reveal that PAG outperforms the state-of-the-art generative retrieval model by a large margin (e.g., 15.6% MRR improvements on MS MARCO), while achieving 22x speed up in terms of query latency.

Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

TL;DR

Abstract

Paper Structure (31 sections, 14 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 14 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Methodology
Preliminaries and Motivations
Generative Retrieval
Constrained Beam Search
Pitfalls of (Constrained) Beam Search
Planning-Ahead Constrained Beam Search
Simultaneous Decoding
Guiding Autoregressive Generation through Simultaneous Decoding
Computational Cost of Decoding
DocID Construction
Sequential DocID Construction
Set-Based DocID Construction
PAG Optimization
...and 16 more sections

Figures (4)

Figure 1: Retrieval effectiveness (MRR@10) and efficiency (query latency) of RIPOR RIPOR w.r.t different beam sizes on the MS MARCO Dev Set -- a standard passage retrieval benchmark with 8.8M passages. The experiment is conducted on a single A100 GPU with 80GB memory. Best to be viewed in color.
Figure 2: A visualization of the PAG framework. Left: Illustration of simultaneous decoding guiding autoregressive generation with approximate document-level scores. Right: illustration of the model $M$ employing joint decoding of set-based and sequential DocIDs.
Figure 3: Results on MS MARCO Dev with different beam sizes on prefix-level and document-level labels.
Figure 4: Above Figure: clusters of relevant documents to 20 queries sampled from TREC-19/20, and the color indicates the query ID. Below Figures: $\Delta$ MRR@10 on MSMARCO Dev and $\Delta$ NDCG@10 on TREC-19/20 between simultaneous+autoregressive decoding (PAG) and simultaneous decoding alone for each query.

Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

TL;DR

Abstract

Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (4)