Table of Contents
Fetching ...

On the Robustness of Generative Information Retrieval Models

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Changjiang Zhou, Maarten de Rijke, Xueqi Cheng

TL;DR

This work interrogates the out-of-distribution robustness of generative information retrieval (IR) models, arguing that high IID performance often masks poor generalization to new distributions. It defines four OOD perspectives—query variations, unseen query types, unseen tasks, and corpus expansion—and evaluates generative IR against sparse and dense baselines using the KILT benchmark. Across 11 datasets and five tasks, generative models (notably CorpusBrain) outperform IID baselines in some settings but show substantial degradation under OOD conditions, with robustness varying by scenario. The study highlights the need for robustness-centric benchmarks and design choices, such as pre-training strategies, to improve reliability in knowledge-intensive retrieval tasks.

Abstract

Generative information retrieval methods retrieve documents by directly generating their identifiers. Much effort has been devoted to developing effective generative IR models. Less attention has been paid to the robustness of these models. It is critical to assess the out-of-distribution (OOD) generalization of generative IR models, i.e., how would such models generalize to new distributions? To answer this question, we focus on OOD scenarios from four perspectives in retrieval problems: (i)query variations; (ii)unseen query types; (iii)unseen tasks; and (iv)corpus expansion. Based on this taxonomy, we conduct empirical studies to analyze the OOD robustness of representative generative IR models against dense retrieval models. Our empirical results indicate that the OOD robustness of generative IR models is in need of improvement. By inspecting the OOD robustness of generative IR models we aim to contribute to the development of more reliable IR models. The code is available at \url{https://github.com/Davion-Liu/GR_OOD}.

On the Robustness of Generative Information Retrieval Models

TL;DR

This work interrogates the out-of-distribution robustness of generative information retrieval (IR) models, arguing that high IID performance often masks poor generalization to new distributions. It defines four OOD perspectives—query variations, unseen query types, unseen tasks, and corpus expansion—and evaluates generative IR against sparse and dense baselines using the KILT benchmark. Across 11 datasets and five tasks, generative models (notably CorpusBrain) outperform IID baselines in some settings but show substantial degradation under OOD conditions, with robustness varying by scenario. The study highlights the need for robustness-centric benchmarks and design choices, such as pre-training strategies, to improve reliability in knowledge-intensive retrieval tasks.

Abstract

Generative information retrieval methods retrieve documents by directly generating their identifiers. Much effort has been devoted to developing effective generative IR models. Less attention has been paid to the robustness of these models. It is critical to assess the out-of-distribution (OOD) generalization of generative IR models, i.e., how would such models generalize to new distributions? To answer this question, we focus on OOD scenarios from four perspectives in retrieval problems: (i)query variations; (ii)unseen query types; (iii)unseen tasks; and (iv)corpus expansion. Based on this taxonomy, we conduct empirical studies to analyze the OOD robustness of representative generative IR models against dense retrieval models. Our empirical results indicate that the OOD robustness of generative IR models is in need of improvement. By inspecting the OOD robustness of generative IR models we aim to contribute to the development of more reliable IR models. The code is available at \url{https://github.com/Davion-Liu/GR_OOD}.

Paper Structure

This paper contains 22 sections, 6 equations, 7 tables.