Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

Yu-An Liu; Ruqing Zhang; Jiafeng Guo; Maarten de Rijke; Yixing Fan; Xueqi Cheng

Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

TL;DR

This paper provides the first comprehensive survey of robustness in neural information retrieval, concentrating on adversarial and out-of-distribution (OOD) robustness. It defines robustness across three facets—IID stability, OOD generalization, and adversarial resilience—and organizes methods around dense retrieval (DRMs) and neural ranking models (NRMs). The survey catalogs datasets, evaluation metrics, and benchmark resources such as BestIR, and discusses open issues and future directions, including the impact of large language models (LLMs) on IR robustness. Overall, it offers a structured roadmap for developing robust, trustworthy IR systems in the face of adversaries, domain shifts, and evolving data landscapes. The work emphasizes practical considerations for deployment, benchmarking, and ongoing research in the LLM era to sustain robust retrieval and ranking performance.

Abstract

Recent advances in neural information retrieval (IR) models have significantly enhanced their effectiveness over various IR tasks. The robustness of these models, essential for ensuring their reliability in practice, has also garnered significant attention. With a wide array of research on robust IR being proposed, we believe it is the opportune moment to consolidate the current status, glean insights from existing methodologies, and lay the groundwork for future development. We view the robustness of IR to be a multifaceted concept, emphasizing its necessity against adversarial attacks, out-of-distribution (OOD) scenarios and performance variance. With a focus on adversarial and OOD robustness, we dissect robustness solutions for dense retrieval models (DRMs) and neural ranking models (NRMs), respectively, recognizing them as pivotal components of the neural IR pipeline. We provide an in-depth discussion of existing methods, datasets, and evaluation metrics, shedding light on challenges and future directions in the era of large language models. To the best of our knowledge, this is the first comprehensive survey on the robustness of neural IR models, and we will also be giving our first tutorial presentation at SIGIR 2024 \url{https://sigir2024-robust-information-retrieval.github.io}. Along with the organization of existing work, we introduce a Benchmark for robust IR (BestIR), a heterogeneous evaluation benchmark for robust neural information retrieval, which is publicly available at \url{https://github.com/Davion-Liu/BestIR}. We hope that this study provides useful clues for future research on the robustness of IR models and helps to develop trustworthy search engines \url{https://github.com/Davion-Liu/Awesome-Robustness-in-Information-Retrieval}.

Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

TL;DR

Abstract

Paper Structure (109 sections, 19 equations, 11 figures, 5 tables, 2 algorithms)

This paper contains 109 sections, 19 equations, 11 figures, 5 tables, 2 algorithms.

Introduction
Why is robustness important in IR?
How to defined robustness in IR?
Relation to other surveys
Contributions of this survey
Organization
Definition and Taxonomy
IR task
Definition of Robustness in IR
Taxonomy of Robustness in IR
IID robustness
OOD robustness
Adversarial robustness
Adversarial Robustness
Overview
...and 94 more sections

Figures (11)

Figure 1: Statistics of publications related to robust neural information retrieval and covered in this survey. "Other” includes arXiv (mostly), TREC, ICDM, NAACL, and TACL.
Figure 2: Overview of the survey. Section \ref{['section Future']} is only partially listed here because of space limitations.
Figure 3: The core of robust IR is to protect the stability of the Top-K results.
Figure 4: A taxonomy of robustness in IR. In this survey, we pay special attention to adversarial robustness and OOD robustness.
Figure 5: Purpose and relationship between adversarial attacks and defenses.
...and 6 more figures

Theorems & Definitions (4)

Definition 2.1: Top-$K$ robustness in information retrieval
Definition 2.2: IID robustness of information retrieval
Definition 2.3: OOD robustness of information retrieval
Definition 2.4: Adversarial robustness in information retrieval

Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

TL;DR

Abstract

Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (4)