Generative Retrieval as Multi-Vector Dense Retrieval
Shiguang Wu, Wenda Wei, Mengqi Zhang, Zhumin Chen, Jun Ma, Zhaochun Ren, Maarten de Rijke, Pengjie Ren
TL;DR
The paper investigates how Generative Retrieval (GR) relates to Multi-Vector Dense Retrieval (MVDR), revealing that GR’s decoder-based relevance can be formulated within the MVDR framework as a sum over token interactions guided by an attention-derived alignment matrix. It shows that GR and MVDR share the same core objective—to compute query-document relevance via token-level vectors and an alignment mechanism—yet differ in document encoding, alignment sparsity, and directionality. Through theoretical derivations and extensive experiments with T5-based variants on NQ320K and MS MARCO, the authors demonstrate a low-rank structure in the alignment matrices and comparable term-matching behavior, while highlighting practical differences in end-to-end versus reranking settings and the impact of improved document encoding (PAWA, NP decoding). The findings provide a principled foundation for integrating GR into MVDR-inspired frameworks and guiding future improvements in generative retrieval systems with attention to alignment strategy and document representation. The work thus offers a unified perspective with meaningful implications for designing scalable, effective neural retrieval models in practice.
Abstract
Generative retrieval generates identifiers of relevant documents in an end-to-end manner using a sequence-to-sequence architecture for a given query. The relation between generative retrieval and other retrieval methods, especially those based on matching within dense retrieval models, is not yet fully comprehended. Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval. Accordingly, generative retrieval exhibits behavior analogous to hierarchical search within a tree index in dense retrieval when using hierarchical semantic identifiers. However, prior work focuses solely on the retrieval stage without considering the deep interactions within the decoder of generative retrieval. In this paper, we fill this gap by demonstrating that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document. Specifically, we examine the attention layer and prediction head of generative retrieval, revealing that generative retrieval can be understood as a special case of multi-vector dense retrieval. Both methods compute relevance as a sum of products of query and document vectors and an alignment matrix. We then explore how generative retrieval applies this framework, employing distinct strategies for computing document token vectors and the alignment matrix. We have conducted experiments to verify our conclusions and show that both paradigms exhibit commonalities of term matching in their alignment matrix.
