Table of Contents
Fetching ...

A Comprehensive Taxonomy of Negation for NLP and Neural Retrievers

Roxana Petcu, Samarth Bhargav, Maarten de Rijke, Evangelos Kanoulas

TL;DR

The paper tackles the persistent difficulty of negation in neural IR and LLM-based retrieval by introducing a comprehensive taxonomy rooted in logic and linguistics, plus two synthetic benchmarks that exhaustively cover negation types. It pairs this with a logic-based LM classification framework to analyze dataset coverage and model behavior. Experimental results show that cross-encoders and transformer-based models encode negation more effectively, synthetic data can speed up convergence on negation tasks, but out-of-domain generalization (e.g., to MSMarco) may suffer. The work highlights the importance of broad negation coverage and architecture-aware training, and it suggests future directions such as reinforcement learning approaches and larger-scale retrieval on negation-rich corpora.

Abstract

Understanding and solving complex reasoning tasks is vital for addressing the information needs of a user. Although dense neural models learn contextualised embeddings, they still underperform on queries containing negation. To understand this phenomenon, we study negation in both traditional neural information retrieval and LLM-based models. We (1) introduce a taxonomy of negation that derives from philosophical, linguistic, and logical definitions; (2) generate two benchmark datasets that can be used to evaluate the performance of neural information retrieval models and to fine-tune models for a more robust performance on negation; and (3) propose a logic-based classification mechanism that can be used to analyze the performance of retrieval models on existing datasets. Our taxonomy produces a balanced data distribution over negation types, providing a better training setup that leads to faster convergence on the NevIR dataset. Moreover, we propose a classification schema that reveals the coverage of negation types in existing datasets, offering insights into the factors that might affect the generalization of fine-tuned models on negation.

A Comprehensive Taxonomy of Negation for NLP and Neural Retrievers

TL;DR

The paper tackles the persistent difficulty of negation in neural IR and LLM-based retrieval by introducing a comprehensive taxonomy rooted in logic and linguistics, plus two synthetic benchmarks that exhaustively cover negation types. It pairs this with a logic-based LM classification framework to analyze dataset coverage and model behavior. Experimental results show that cross-encoders and transformer-based models encode negation more effectively, synthetic data can speed up convergence on negation tasks, but out-of-domain generalization (e.g., to MSMarco) may suffer. The work highlights the importance of broad negation coverage and architecture-aware training, and it suggests future directions such as reinforcement learning approaches and larger-scale retrieval on negation-rich corpora.

Abstract

Understanding and solving complex reasoning tasks is vital for addressing the information needs of a user. Although dense neural models learn contextualised embeddings, they still underperform on queries containing negation. To understand this phenomenon, we study negation in both traditional neural information retrieval and LLM-based models. We (1) introduce a taxonomy of negation that derives from philosophical, linguistic, and logical definitions; (2) generate two benchmark datasets that can be used to evaluate the performance of neural information retrieval models and to fine-tune models for a more robust performance on negation; and (3) propose a logic-based classification mechanism that can be used to analyze the performance of retrieval models on existing datasets. Our taxonomy produces a balanced data distribution over negation types, providing a better training setup that leads to faster convergence on the NevIR dataset. Moreover, we propose a classification schema that reveals the coverage of negation types in existing datasets, offering insights into the factors that might affect the generalization of fine-tuned models on negation.

Paper Structure

This paper contains 26 sections, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Example instance from our Free Generation dataset for sentential negation. Doc 1 is a passage retrieved from an existing Wikipedia article; Doc 2 is a minimally edited counterfactual whose truth value is flipped. The task is pairwise ranking. Given two queries that only differ in the presence of negation, the retrieval model must rank the corresponding document higher. The model succeeds if it ranks the correct document higher for both queries. There is a $25\%$ random chance in pairwise accuracy.
  • Figure 2: Negation taxonomy tree.
  • Figure 3: Pairwise Accuracy on the free generations dataset. The first result column contains the full dataset; later columns represent one negation type each. Models are represented by the rows, where I is a shortcut for Instruct. On the right, we assign labels expressing the architecture and training objective of each model: the first position shows the architecture, i.e., Sparse, Bi-encoder, Dual encoder, Crossencoder, and Transformer; the second position shows the training objective, i.e., Retrieval, Search, Similarity, Ranking, Natural Language Inference, and Next Token Prediction. For a close-up, see Appendix \ref{['ap:results']}.
  • Figure 4: Pairwise Accuracy on NevIR as split with our classification mechanism.
  • Figure 5: Prompts for Sentential Negation
  • ...and 9 more figures