Table of Contents
Fetching ...

Neutralizing Bias in LLM Reasoning using Entailment Graphs

Liang Cheng, Tianyi Li, Zhaowei Wang, Tianyang Liu, Mark Steedman

TL;DR

This work tackles attestation bias in LLM-based Natural Language Inference (NLI), where models over-rely on memorized hypotheses rather than premises. It introduces an unsupervised pipeline that builds Entailment Graphs (EGs) from open-domain corpora, instantiates these graphs with typed entities to generate counterfactual NLI data, and fine-tunes LLMs using LoRA on this data. The approach yields significant reductions in attestation bias across multiple models and improves inferential performance, with especially strong gains for smaller models and robust improvements on bias-neutralized test sets. The bias-neutralized evaluation framework further enables a fair assessment of true reasoning capability, advancing robust NLI reasoning in practical settings and offering a pathway to broader task generalization in future work.

Abstract

LLMs are often claimed to be capable of Natural Language Inference (NLI), which is widely regarded as a cornerstone of more complex forms of reasoning. However, recent works show that LLMs still suffer from hallucinations in NLI due to attestation bias, where LLMs overly rely on propositional memory to build shortcuts. To solve the issue, we design an unsupervised framework to construct counterfactual reasoning data and fine-tune LLMs to reduce attestation bias. To measure bias reduction, we build bias-adversarial variants of NLI datasets with randomly replaced predicates in premises while keeping hypotheses unchanged. Extensive evaluations show that our framework can significantly reduce hallucinations from attestation bias. Then, we further evaluate LLMs fine-tuned with our framework on original NLI datasets and their bias-neutralized versions, where original entities are replaced with randomly sampled ones. Extensive results show that our framework consistently improves inferential performance on both original and bias-neutralized NLI datasets.

Neutralizing Bias in LLM Reasoning using Entailment Graphs

TL;DR

This work tackles attestation bias in LLM-based Natural Language Inference (NLI), where models over-rely on memorized hypotheses rather than premises. It introduces an unsupervised pipeline that builds Entailment Graphs (EGs) from open-domain corpora, instantiates these graphs with typed entities to generate counterfactual NLI data, and fine-tunes LLMs using LoRA on this data. The approach yields significant reductions in attestation bias across multiple models and improves inferential performance, with especially strong gains for smaller models and robust improvements on bias-neutralized test sets. The bias-neutralized evaluation framework further enables a fair assessment of true reasoning capability, advancing robust NLI reasoning in practical settings and offering a pathway to broader task generalization in future work.

Abstract

LLMs are often claimed to be capable of Natural Language Inference (NLI), which is widely regarded as a cornerstone of more complex forms of reasoning. However, recent works show that LLMs still suffer from hallucinations in NLI due to attestation bias, where LLMs overly rely on propositional memory to build shortcuts. To solve the issue, we design an unsupervised framework to construct counterfactual reasoning data and fine-tune LLMs to reduce attestation bias. To measure bias reduction, we build bias-adversarial variants of NLI datasets with randomly replaced predicates in premises while keeping hypotheses unchanged. Extensive evaluations show that our framework can significantly reduce hallucinations from attestation bias. Then, we further evaluate LLMs fine-tuned with our framework on original NLI datasets and their bias-neutralized versions, where original entities are replaced with randomly sampled ones. Extensive results show that our framework consistently improves inferential performance on both original and bias-neutralized NLI datasets.

Paper Structure

This paper contains 41 sections, 2 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: An example of attestation bias. LLMs tend to evaluate entailment with their memorized knowledge rather than given premise.
  • Figure 2: The pipeline of our approach: Step 1: Build EGs in unsupervised manner. Step 2: Instantiate predicates using random entities with matching types, then wrap instantiated predicates into prompts to generate training corpus.
  • Figure 3: The Attestation Bias scores for the original Levy/Holt, demonstrating a consistent attestation bias reduction after fine-tuning with EGs.
  • Figure 4: AUC scores of baseline (outline) and EG-trained (solid) LLMs on original (orange) and our bias-neutralized (blue) Levy/Holt.
  • Figure 5: The probability of predicting Entail for RPI LevyHolt, conditioned on the LLMs' attestation of the hypothesis. Since predicting Entail in this context represents a false positive hallucination, a lower probability is better. The image clearly shows that hallucination decrease significantly after fine-tuning with EGs.
  • ...and 2 more figures