Table of Contents
Fetching ...

MorphNLI: A Stepwise Approach to Natural Language Inference Using Text Morphing

Vlad Andrei Negru, Robert Vacareanu, Camelia Lemnaru, Mihai Surdeanu, Rodica Potolea

TL;DR

We address semantic reasoning and dataset artifact challenges in natural language inference by proposing MorphNLI, a modular approach that morphs the premise toward the hypothesis via a sequence of atomic edits $M=(M_0,\dots,M_k)$, applies a standard NLI classifier to each adjacent pair, and aggregates the per-step labels to a final decision. The morphism generator is trained with a teacher–student setup using synthetic data generated with in-context learning and a filtering stage to prune low-quality morphs, enabling efficient fine-tuning of a smaller morphism model for inference. Cross-domain evaluation on MNLI and SICK shows MorphNLI outperforms vanilla NLI baselines in OOD settings and yields more faithful, interpretable explanations through the morphing chain, though results vary with dataset and potential contamination in LLM explanations. Overall, MorphNLI provides robust cross-domain NLI performance with transparent, stepwise reasoning, offering a practical route toward trustworthy and explainable inference in real-world applications.

Abstract

We introduce MorphNLI, a modular step-by-step approach to natural language inference (NLI). When classifying the premise-hypothesis pairs into {entailment, contradiction, neutral}, we use a language model to generate the necessary edits to incrementally transform (i.e., morph) the premise into the hypothesis. Then, using an off-the-shelf NLI model we track how the entailment progresses with these atomic changes, aggregating these intermediate labels into a final output. We demonstrate the advantages of our proposed method particularly in realistic cross-domain settings, where our method always outperforms strong baselines with improvements up to 12.6% (relative). Further, our proposed approach is explainable as the atomic edits can be used to understand the overall NLI label.

MorphNLI: A Stepwise Approach to Natural Language Inference Using Text Morphing

TL;DR

We address semantic reasoning and dataset artifact challenges in natural language inference by proposing MorphNLI, a modular approach that morphs the premise toward the hypothesis via a sequence of atomic edits , applies a standard NLI classifier to each adjacent pair, and aggregates the per-step labels to a final decision. The morphism generator is trained with a teacher–student setup using synthetic data generated with in-context learning and a filtering stage to prune low-quality morphs, enabling efficient fine-tuning of a smaller morphism model for inference. Cross-domain evaluation on MNLI and SICK shows MorphNLI outperforms vanilla NLI baselines in OOD settings and yields more faithful, interpretable explanations through the morphing chain, though results vary with dataset and potential contamination in LLM explanations. Overall, MorphNLI provides robust cross-domain NLI performance with transparent, stepwise reasoning, offering a practical route toward trustworthy and explainable inference in real-world applications.

Abstract

We introduce MorphNLI, a modular step-by-step approach to natural language inference (NLI). When classifying the premise-hypothesis pairs into {entailment, contradiction, neutral}, we use a language model to generate the necessary edits to incrementally transform (i.e., morph) the premise into the hypothesis. Then, using an off-the-shelf NLI model we track how the entailment progresses with these atomic changes, aggregating these intermediate labels into a final output. We demonstrate the advantages of our proposed method particularly in realistic cross-domain settings, where our method always outperforms strong baselines with improvements up to 12.6% (relative). Further, our proposed approach is explainable as the atomic edits can be used to understand the overall NLI label.

Paper Structure

This paper contains 24 sections, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Natural language inference example where both a state-of-the-art encoder-decoder model -- BART (left) and a LLM -- GPT-4o (middle) predict the incorrect label. Our approach (right) incrementally morphs the premise into the hypothesis, which decomposes the inference process into several simpler steps. This allows it to generate the correct label, which is also associated with an intuitive explanation that falls naturally from the morphing steps. In contrast, both the encoder-decoder model and the LLM produce the incorrect label. The LLM's explanation suggests overfitting on annotation artifacts from SNLI, which assumes coreference between participants and concepts in the two texts jiang-marneffe-2022-investigating.
  • Figure 2: Training (top) and inference (bottom) for MorphNLI, including synthetic data generation for morphing. For the teacher model we use GPT-4; for the student model we use GPT-4o-mini.
  • Figure 3: Example of a short morphism for sentence $M_2$. The information about the context of the action ("on green grass") is lost when $M_2$ is generated. A similar context is then added in $M_3$ ("outside"), yielding a faulty neutral prediction because the connection "on green grass" $\rightarrow$ "outside" is lost.
  • Figure 4: Example of morphisms with no voice correction. Due to the difficulties caused by the change from passive to active voice between premise and hypothesis, the morphing model "hallucinates" inner sentences.
  • Figure 5: Misbehavior of GPT-4o related to the artifacts from SNLI. The underlined font highlights the explanation fragments that are not correct with respect to the italic text in premise/hypothesis. Here the model incorrectly assumes that the dog in the premise being the same as the dog in the hypothesis.
  • ...and 6 more figures