Table of Contents
Fetching ...

Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

Atticus Geiger, Kyle Richardson, Christopher Potts

TL;DR

The work probes whether neural NLI models capture the monotonic interactions between lexical entailment and negation by introducing the MoNLI dataset and applying both behavioral and structural evaluations. Behavioral results show models trained on general NLI data struggle with negation-involved examples, but MoNLI-focused fine-tuning improves generalization; systematically, models demonstrate the ability to generalize to unseen substitutions under negation. Structural analyses using probes and interchange interventions provide evidence that a top model (BERT) partially mirrors the causal dynamics of the monotonicity algorithm, indicating algorithmic-level encoding of lexical entailment and negation in parts of the network. The study advocates a holistic evaluation approach to understand when and how neural models internalize compositional semantics, with implications for interpretability and robust reasoning in NLP.

Abstract

We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural evaluation methods of (3) probes and (4) interventions. To facilitate this holistic evaluation, we present Monotonicity NLI (MoNLI), a new naturalistic dataset focused on lexical entailment and negation. In our behavioral evaluations, we find that models trained on general-purpose NLI datasets fail systematically on MoNLI examples containing negation, but that MoNLI fine-tuning addresses this failure. In our structural evaluations, we look for evidence that our top-performing BERT-based model has learned to implement the monotonicity algorithm behind MoNLI. Probes yield evidence consistent with this conclusion, and our intervention experiments bolster this, showing that the causal dynamics of the model mirror the causal dynamics of this algorithm on subsets of MoNLI. This suggests that the BERT model at least partially embeds a theory of lexical entailment and negation at an algorithmic level.

Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

TL;DR

The work probes whether neural NLI models capture the monotonic interactions between lexical entailment and negation by introducing the MoNLI dataset and applying both behavioral and structural evaluations. Behavioral results show models trained on general NLI data struggle with negation-involved examples, but MoNLI-focused fine-tuning improves generalization; systematically, models demonstrate the ability to generalize to unseen substitutions under negation. Structural analyses using probes and interchange interventions provide evidence that a top model (BERT) partially mirrors the causal dynamics of the monotonicity algorithm, indicating algorithmic-level encoding of lexical entailment and negation in parts of the network. The study advocates a holistic evaluation approach to understand when and how neural models internalize compositional semantics, with implications for interpretability and robust reasoning in NLP.

Abstract

We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural evaluation methods of (3) probes and (4) interventions. To facilitate this holistic evaluation, we present Monotonicity NLI (MoNLI), a new naturalistic dataset focused on lexical entailment and negation. In our behavioral evaluations, we find that models trained on general-purpose NLI datasets fail systematically on MoNLI examples containing negation, but that MoNLI fine-tuning addresses this failure. In our structural evaluations, we look for evidence that our top-performing BERT-based model has learned to implement the monotonicity algorithm behind MoNLI. Probes yield evidence consistent with this conclusion, and our intervention experiments bolster this, showing that the causal dynamics of the model mirror the causal dynamics of this algorithm on subsets of MoNLI. This suggests that the BERT model at least partially embeds a theory of lexical entailment and negation at an algorithmic level.

Paper Structure

This paper contains 26 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: An algorithm able to solve the MoNLI dataset that provides a theoretically motivated learning target for neural models at an algorithmic level of analysis Marr:1982:VCI:1095712. $\proc{Infer}$ takes in an example from MoNLI and outputs the relation between the premise and hypothesis. It uses three predefined functions. get-lex-rel returns the relation (one of $\{\sqsupset, \sqsubset\}$) between the substituted words in the premise and hypothesis. contains-not returns true iff negation is present. reverse maps $\sqsubset$ to $\sqsupset$ and vice-versa.
  • Figure 2: Results where classifier probes are trained on BERT representations to predict the value of lexrel and the output of $\proc{Infer}$ (Figure \ref{['alg:1']}). The grey dotted line provides a soft ceiling for selectivity values, because we expect control probes trained on a binary task to at least achieve chance accuracy.
  • Figure 3: An illustrative interchange intervention: The solid arrows represent a hypothesis about where the model stores and uses information about lexical entailment. The dotted arrow is an interchange intervention, where the green vector (top) we think stores reverse entailment, trees $\sqsupset$ elms, is interchanged with the red vector (middle) we think stores forward entailment, pugs $\sqsubset$ dogs, leading to a modified network (bottom). If our hypothesis is correct, then the output should change from entailment to neutral, because the negation in the green example reverses the relationship between lexical entailment and sentence-level entailment. If this label reversal is not observed, crucial entailment information must lie elsewhere in the network.
  • Figure 4: Inoculation results for our four models performing our systematic generalization task.
  • Figure 5: A visualization of the largest subset of MoNLI on which we verified BERT mimics the causal dynamics of Infer. This subset contains 98 examples and we display the substituted words in each. The first word in the pair comes from the premise and we cluster word pairs based on hyponyms.