Table of Contents
Fetching ...

Making Language Models Robust Against Negation

MohammadHossein Rezaei, Eduardo Blanco

TL;DR

This paper addresses the persistent challenge that language models face when processing negation. It introduces two self-supervised pre-training tasks—Next Sentence Polarity Prediction (NSPP) and a polarity-reversed NSP variant—that enable models to better reason about negation without requiring labeled data. Across a broad suite of benchmarks, including CondaQA and several NLI/NLU datasets plus LAMA/LAMA-Neg, models pre-trained on these tasks show consistent improvements in negation robustness, with NSP generally delivering stronger gains than NSPP. While joint training on both tasks can be beneficial in some settings, it does not universally improve performance, highlighting nuanced interactions between tasks and model sizes. Overall, the approach enhances negation reasoning while maintaining competitive performance on inputs without negation, offering a scalable, language-agnostic pathway to more reliable NLP systems in real-world contexts.

Abstract

Negation has been a long-standing challenge for language models. Previous studies have shown that they struggle with negation in many natural language understanding tasks. In this work, we propose a self-supervised method to make language models more robust against negation. We introduce a novel task, Next Sentence Polarity Prediction (NSPP), and a variation of the Next Sentence Prediction (NSP) task. We show that BERT and RoBERTa further pre-trained on our tasks outperform the off-the-shelf versions on nine negation-related benchmarks. Most notably, our pre-training tasks yield between 1.8% and 9.1% improvement on CondaQA, a large question-answering corpus requiring reasoning over negation.

Making Language Models Robust Against Negation

TL;DR

This paper addresses the persistent challenge that language models face when processing negation. It introduces two self-supervised pre-training tasks—Next Sentence Polarity Prediction (NSPP) and a polarity-reversed NSP variant—that enable models to better reason about negation without requiring labeled data. Across a broad suite of benchmarks, including CondaQA and several NLI/NLU datasets plus LAMA/LAMA-Neg, models pre-trained on these tasks show consistent improvements in negation robustness, with NSP generally delivering stronger gains than NSPP. While joint training on both tasks can be beneficial in some settings, it does not universally improve performance, highlighting nuanced interactions between tasks and model sizes. Overall, the approach enhances negation reasoning while maintaining competitive performance on inputs without negation, offering a scalable, language-agnostic pathway to more reliable NLP systems in real-world contexts.

Abstract

Negation has been a long-standing challenge for language models. Previous studies have shown that they struggle with negation in many natural language understanding tasks. In this work, we propose a self-supervised method to make language models more robust against negation. We introduce a novel task, Next Sentence Polarity Prediction (NSPP), and a variation of the Next Sentence Prediction (NSP) task. We show that BERT and RoBERTa further pre-trained on our tasks outperform the off-the-shelf versions on nine negation-related benchmarks. Most notably, our pre-training tasks yield between 1.8% and 9.1% improvement on CondaQA, a large question-answering corpus requiring reasoning over negation.

Paper Structure

This paper contains 33 sections, 8 figures, 10 tables.

Figures (8)

  • Figure 1: An example of the training data for our self-supervised tasks. The tasks are: (a) given a sentence, predict whether the next sentence will contain negation (NSPP) and (b) given two sentences, predict whether the second sentence is a coherent continuation of the first one (NSP).
  • Figure 2: Trends in pre-training transformers on NSPP, NSP, and both tasks jointly from left to right. Validation loss decreases as the model is trained on larger subsets of the corpus. We stop training when the validation loss plateaus.
  • Figure 3: Examples of Llama-2-7B failing to remove the negation cue from a sentence. The model resists removing the negation cue, arguing that the sentence is factually incorrect or incoherent. In the last example, the model returns the original sentence without any changes claiming that it has removed the negation cue and fixed the grammar.
  • Figure 4: Examples of prompting ChatGPT to remove negation cues from a sentence. In the first example, the model replaces the negation cue "can't" with "cannot". Updating the prompt and asking the model to remove any negation cues rather than specifically "n't" results in the same problem.
  • Figure 5: An example of Llama-2-7B adding the negation cue "not" to a sentence. The model resists adding the negation cue, arguing that the sentence is inappropriate or disrespectful. When asked to add the negation cue without considering the appropriateness or factuality of the sentence and focusing on grammar, the model adds the negation cue to the beginning of the sentence instead of the main verb.
  • ...and 3 more figures