Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding
Hansi Zeng, Chen Luo, Hamed Zamani
TL;DR
This work tackles the bottleneck of generative retrieval arising from local, prefix-based pruning in constrained beam search. It introduces Planning-Ahead in Generative Retrieval (PAG), a framework that combines a set-based, lexical DocID with a sequential, semantic DocID implemented via residual quantization and joint optimization, guided by simultaneous decoding. PAG demonstrates large improvements over the prior state-of-the-art on MSMARCO and TREC-DL benchmarks, while delivering substantial latency reductions and memory efficiency, enabling scalable retrieval. The approach offers a practical pathway for incorporating document-level scoring into autoregressive generation and holds potential for broader knowledge-intensive tasks beyond passage retrieval.
Abstract
This paper introduces PAG-a novel optimization and decoding approach that guides autoregressive generation of document identifiers in generative retrieval models through simultaneous decoding. To this aim, PAG constructs a set-based and sequential identifier for each document. Motivated by the bag-of-words assumption in information retrieval, the set-based identifier is built on lexical tokens. The sequential identifier, on the other hand, is obtained via quantizing relevance-based representations of documents. Extensive experiments on MSMARCO and TREC Deep Learning Track data reveal that PAG outperforms the state-of-the-art generative retrieval model by a large margin (e.g., 15.6% MRR improvements on MS MARCO), while achieving 22x speed up in terms of query latency.
