A Survey of Generative Information Retrieval
Tzu-Lin Kuo, Tzu-Wei Chiu, Tzung-Sheng Lin, Sheng-Yang Wu, Chao-Wei Huang, Yun-Nung Chen
TL;DR
This survey addresses the shift from traditional IR to Generative Retrieval (GR), where a seq2seq model directly maps user queries to document identifiers (DocIDs) without explicit query processing or reranking. It defines GR, surveys indexing and retrieval mechanisms, and classifies DocID strategies into numerical and string-based types, highlighting that semantically informed identifiers typically yield stronger retrieval signals. The paper analyzes evaluation frameworks, baselines, and dataset usage (e.g., MS MARCO, Natural Questions), and discusses scalability and dynamic-corpus challenges, proposing learnable DocIDs, higher-quality query generation, and decoder-based LLMs as promising directions. It concludes with actionable future directions in training methods, indexing scalability, and multi-task learning to advance GR's practicality and performance in real-world information retrieval systems.
Abstract
Generative Retrieval (GR) is an emerging paradigm in information retrieval that leverages generative models to directly map queries to relevant document identifiers (DocIDs) without the need for traditional query processing or document reranking. This survey provides a comprehensive overview of GR, highlighting key developments, indexing and retrieval strategies, and challenges. We discuss various document identifier strategies, including numerical and string-based identifiers, and explore different document representation methods. Our primary contribution lies in outlining future research directions that could profoundly impact the field: improving the quality of query generation, exploring learnable document identifiers, enhancing scalability, and integrating GR with multi-task learning frameworks. By examining state-of-the-art GR techniques and their applications, this survey aims to provide a foundational understanding of GR and inspire further innovations in this transformative approach to information retrieval. We also make the complementary materials such as paper collection publicly available at https://github.com/MiuLab/GenIR-Survey/
