Table of Contents
Fetching ...

ExcluIR: Exclusionary Neural Information Retrieval

Wenhao Zhang, Mengqi Zhang, Shiguang Wu, Jiahuan Pei, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Pengjie Ren

TL;DR

ExcluIR tackles the understudied problem of exclusionary retrieval by introducing a dedicated benchmark (3,452 annotated queries) and a large training set (70,293 exclusionary queries) built from HotpotQA via ChatGPT and human correction. The study systematically compares sparse, dense, and generative retrieval approaches, finding that existing models struggle with exclusionary intent and that generative models have a natural edge when trained on exclusionary data. Key findings show that incorporating ExcluIR data substantially improves exclusionary performance, with generative methods gaining the most, while simply expanding model size or data domains yields limited or inconsistent gains. The dataset and evaluation scripts enable targeted research on exclusionary retrieval, offering actionable insights into model architectures and training strategies for handling negation-like and exclusionary language in retrieval tasks.

Abstract

Exclusion is an important and universal linguistic skill that humans use to express what they do not want. However, in information retrieval community, there is little research on exclusionary retrieval, where users express what they do not want in their queries. In this work, we investigate the scenario of exclusionary retrieval in document retrieval for the first time. We present ExcluIR, a set of resources for exclusionary retrieval, consisting of an evaluation benchmark and a training set for helping retrieval models to comprehend exclusionary queries. The evaluation benchmark includes 3,452 high-quality exclusionary queries, each of which has been manually annotated. The training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document. We conduct detailed experiments and analyses, obtaining three main observations: (1) Existing retrieval models with different architectures struggle to effectively comprehend exclusionary queries; (2) Although integrating our training data can improve the performance of retrieval models on exclusionary retrieval, there still exists a gap compared to human performance; (3) Generative retrieval models have a natural advantage in handling exclusionary queries. To facilitate future research on exclusionary retrieval, we share the benchmark and evaluation scripts on \url{https://github.com/zwh-sdu/ExcluIR}.

ExcluIR: Exclusionary Neural Information Retrieval

TL;DR

ExcluIR tackles the understudied problem of exclusionary retrieval by introducing a dedicated benchmark (3,452 annotated queries) and a large training set (70,293 exclusionary queries) built from HotpotQA via ChatGPT and human correction. The study systematically compares sparse, dense, and generative retrieval approaches, finding that existing models struggle with exclusionary intent and that generative models have a natural edge when trained on exclusionary data. Key findings show that incorporating ExcluIR data substantially improves exclusionary performance, with generative methods gaining the most, while simply expanding model size or data domains yields limited or inconsistent gains. The dataset and evaluation scripts enable targeted research on exclusionary retrieval, offering actionable insights into model architectures and training strategies for handling negation-like and exclusionary language in retrieval tasks.

Abstract

Exclusion is an important and universal linguistic skill that humans use to express what they do not want. However, in information retrieval community, there is little research on exclusionary retrieval, where users express what they do not want in their queries. In this work, we investigate the scenario of exclusionary retrieval in document retrieval for the first time. We present ExcluIR, a set of resources for exclusionary retrieval, consisting of an evaluation benchmark and a training set for helping retrieval models to comprehend exclusionary queries. The evaluation benchmark includes 3,452 high-quality exclusionary queries, each of which has been manually annotated. The training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document. We conduct detailed experiments and analyses, obtaining three main observations: (1) Existing retrieval models with different architectures struggle to effectively comprehend exclusionary queries; (2) Although integrating our training data can improve the performance of retrieval models on exclusionary retrieval, there still exists a gap compared to human performance; (3) Generative retrieval models have a natural advantage in handling exclusionary queries. To facilitate future research on exclusionary retrieval, we share the benchmark and evaluation scripts on \url{https://github.com/zwh-sdu/ExcluIR}.
Paper Structure (28 sections, 6 equations, 10 figures, 9 tables)

This paper contains 28 sections, 6 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: A comparison between non-exclusionary and exclusionary queries. Exclusionary queries often specify content to be excluded (e.g., "Avengers: Endgame") to express the user's requirements for omitting certain information. In this case, if the retrieval system fails to comprehend the exclusionary nature of a query (e.g., one containing the term "besides,") it will produce retrieval results that users do not desire.
  • Figure 2: Overview of ExcluIR dataset construction process.
  • Figure 3: Distribution of the lengths of exclusionary queries in ExcluIR.
  • Figure 4: Performance of models under different training data settings. The upper figures show the RR score of various models on the ExcluIR benchmark, and the lower figures show the performance of these models on HotpotQA and NQ320k. The different colors of the bars represent different training data. Full results are presented in Appendix \ref{['appendix:aug_results']}.
  • Figure 5: Summary of the analysis that shows the differences between dense retrieval and generative retrieval models in handling ExcluIR.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Definition 1
  • Claim 1
  • proof