A Comprehensive Taxonomy of Negation for NLP and Neural Retrievers

Roxana Petcu; Samarth Bhargav; Maarten de Rijke; Evangelos Kanoulas

A Comprehensive Taxonomy of Negation for NLP and Neural Retrievers

Roxana Petcu, Samarth Bhargav, Maarten de Rijke, Evangelos Kanoulas

TL;DR

The paper tackles the persistent difficulty of negation in neural IR and LLM-based retrieval by introducing a comprehensive taxonomy rooted in logic and linguistics, plus two synthetic benchmarks that exhaustively cover negation types. It pairs this with a logic-based LM classification framework to analyze dataset coverage and model behavior. Experimental results show that cross-encoders and transformer-based models encode negation more effectively, synthetic data can speed up convergence on negation tasks, but out-of-domain generalization (e.g., to MSMarco) may suffer. The work highlights the importance of broad negation coverage and architecture-aware training, and it suggests future directions such as reinforcement learning approaches and larger-scale retrieval on negation-rich corpora.

Abstract

Understanding and solving complex reasoning tasks is vital for addressing the information needs of a user. Although dense neural models learn contextualised embeddings, they still underperform on queries containing negation. To understand this phenomenon, we study negation in both traditional neural information retrieval and LLM-based models. We (1) introduce a taxonomy of negation that derives from philosophical, linguistic, and logical definitions; (2) generate two benchmark datasets that can be used to evaluate the performance of neural information retrieval models and to fine-tune models for a more robust performance on negation; and (3) propose a logic-based classification mechanism that can be used to analyze the performance of retrieval models on existing datasets. Our taxonomy produces a balanced data distribution over negation types, providing a better training setup that leads to faster convergence on the NevIR dataset. Moreover, we propose a classification schema that reveals the coverage of negation types in existing datasets, offering insights into the factors that might affect the generalization of fine-tuned models on negation.

A Comprehensive Taxonomy of Negation for NLP and Neural Retrievers

TL;DR

Abstract

A Comprehensive Taxonomy of Negation for NLP and Neural Retrievers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)