Estimating the Causal Effects of Natural Logic Features in Neural NLI Models

Julia Rozanova; Marco Valentino; Andre Freitas

Estimating the Causal Effects of Natural Logic Features in Neural NLI Models

Julia Rozanova, Marco Valentino, Andre Freitas

TL;DR

This work introduces a causal intervention framework to quantify how semantic features—specifically context monotonicity and word-pair relations—drive NLI model predictions. By constructing a causal diagram for the NLI-XY subtask and estimating total causal effects (TCE) and undesired direct effects (DCE) through interventional data, the authors assess model robustness to surface-form changes and sensitivity to semantic changes. They compare various models, including HELP-fine-tuned variants, finding that high benchmark accuracy does not guarantee strong causal robustness; HELP generally improves sensitivity to context and reduces reliance on lexical cues, though trade-offs with standard benchmarks can occur. The findings advance interpretability in NLP by linking causal influence to model behavior, informing interventions to strengthen reasoning patterns and reduce brittle reliance on superficial cues.

Abstract

Rigorous evaluation of the causal effects of semantic features on language model predictions can be hard to achieve for natural language reasoning problems. However, this is such a desirable form of analysis from both an interpretability and model evaluation perspective, that it is valuable to zone in on specific patterns of reasoning with enough structure and regularity to be able to identify and quantify systematic reasoning failures in widely-used models. In this vein, we pick a portion of the NLI task for which an explicit causal diagram can be systematically constructed: in particular, the case where across two sentences (the premise and hypothesis), two related words/terms occur in a shared context. In this work, we apply causal effect estimation strategies to measure the effect of context interventions (whose effect on the entailment label is mediated by the semantic monotonicity characteristic) and interventions on the inserted word-pair (whose effect on the entailment label is mediated by the relation between these words.). Following related work on causal analysis of NLP models in different settings, we adapt the methodology for the NLI task to construct comparative model profiles in terms of robustness to irrelevant changes and sensitivity to impactful changes.

Estimating the Causal Effects of Natural Logic Features in Neural NLI Models

TL;DR

Abstract

Paper Structure (21 sections, 4 equations, 5 figures, 6 tables)

This paper contains 21 sections, 4 equations, 5 figures, 6 tables.

Introduction
Problem Formulation
A Structured NLI Subtask
The Causal Structure of Model Decision-Making
Diagram Specification
Estimating the Causal Effects
Total Causal Effects
Undesired Direct Causal Effects
Experimental Setup
Data and Interventions
Model Choice and Benchmark Comparison
Results and Discussion
Causal Effect of Inserted Word Pairs
Causal Effect of Contexts
Other Potential Direct Effects
...and 6 more sections

Figures (5)

Figure 1: We propose a causal intervention framework for systematically inspecting monotonicity reasoning in NLI models.
Figure 2: Causal Diagram for the Natural Logic Subtask
Figure 3: Specification of the causal diagram for possible routes of model reasoning for NLI-XY problems. Green edges indicate desired causal influence, while red edges indicate undesired paths of causal influence via surface-level heuristics.
Figure 4: Results for Insertion Interventions
Figure 5: Results for Context Interventions

Estimating the Causal Effects of Natural Logic Features in Neural NLI Models

TL;DR

Abstract

Estimating the Causal Effects of Natural Logic Features in Neural NLI Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)