Table of Contents
Fetching ...

PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise

Sapir Harary, Eran Hirsch, Aviv Slobodkin, David Wan, Mohit Bansal, Ido Dagan

TL;DR

PrefixNLI presents a prefix-level entailment framework for improving factual faithfulness in autoregressive generation. It introduces PrefixNLI as a task, constructs dedicated prefix-level evaluation and training datasets, and trains MiniTruePrefixes to outperform sentence-level baselines. Integrated into a controlled decoding strategy, MiniTruePrefixes improves faithfulness across model sizes and datasets with modest latency costs, and generalizes to alternate generator families. The work highlights both practical gains and future avenues for prefix-level entailment in training and decoding.

Abstract

Natural Language Inference (NLI) models have been used in various ways to improve the factuality of LLM outputs. This is typically done by applying an NLI model to judge whether the model output is entailed from the supposed evidence, triggering some corrective actions, such as beam reranking at inference time or RL rewards during training. While NLI models are trained to detect factual inconsistencies over complete sentences, decisions in the common autoregressive generation architecture are made for each evolving text prefix, during decoding. Addressing this setting, we generalize the entailment detection task to apply over arbitrary text prefixes, and suggest its utility for improving generation faithfulness. Providing suitable evaluation and training datasets for this task, we train MiniTruePrefixes, a novel specialized model that better detects factual inconsistencies over text prefixes, outperforming comparable baseline NLI models by 5-14 F1 points in prefix-level entailment. We further demonstrate that integrating MiniTruePrefixes into a controlled decoding framework substantially improves factual consistency in abstractive summarization. When guided by MiniTruePrefixes, LLaMA-3.2-3B-Instruct matches the faithfulness and runtime of the 8B model from the same model family, while using only half the memory.

PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise

TL;DR

PrefixNLI presents a prefix-level entailment framework for improving factual faithfulness in autoregressive generation. It introduces PrefixNLI as a task, constructs dedicated prefix-level evaluation and training datasets, and trains MiniTruePrefixes to outperform sentence-level baselines. Integrated into a controlled decoding strategy, MiniTruePrefixes improves faithfulness across model sizes and datasets with modest latency costs, and generalizes to alternate generator families. The work highlights both practical gains and future avenues for prefix-level entailment in training and decoding.

Abstract

Natural Language Inference (NLI) models have been used in various ways to improve the factuality of LLM outputs. This is typically done by applying an NLI model to judge whether the model output is entailed from the supposed evidence, triggering some corrective actions, such as beam reranking at inference time or RL rewards during training. While NLI models are trained to detect factual inconsistencies over complete sentences, decisions in the common autoregressive generation architecture are made for each evolving text prefix, during decoding. Addressing this setting, we generalize the entailment detection task to apply over arbitrary text prefixes, and suggest its utility for improving generation faithfulness. Providing suitable evaluation and training datasets for this task, we train MiniTruePrefixes, a novel specialized model that better detects factual inconsistencies over text prefixes, outperforming comparable baseline NLI models by 5-14 F1 points in prefix-level entailment. We further demonstrate that integrating MiniTruePrefixes into a controlled decoding framework substantially improves factual consistency in abstractive summarization. When guided by MiniTruePrefixes, LLaMA-3.2-3B-Instruct matches the faithfulness and runtime of the 8B model from the same model family, while using only half the memory.

Paper Structure

This paper contains 69 sections, 4 equations, 4 figures, 19 tables.

Figures (4)

  • Figure 1: Illustration of PrefixNLI and its downstream utilization for controlled decoding. During autoregressive generation, the base model favors a hallucinated token (“Jeremy”) that is not supported by the source document. Our MiniTruePrefixes model directly evaluates the factual consistency of candidate prefixes at each generation step, assigning here a low entailment score to this unfaithful continuation. For tokens with low entailment probability, we apply a penalty, effectively discouraging unfaithful continuations. This guides generation toward faithful outputs (“Nicky”) in a fine-grained and efficient manner.
  • Figure 2: Comparing faithfulness assessment approaches in controlled decoding. PrefixNLI evaluates the faithfulness of the generated prefix itself. In contrast, lookahead-based methods wan2023faithfulnesssridhar2023improvedbeamsearchhallucination first complete the summary before evaluation, which would be misleading in case the completed summary is found unfaithful due to a factual inconsistency that arises only within the speculated completion, as illustrated in the figure.
  • Figure 3: Our SummEdits Prefixes evaluation dataset creation. Given a seed and a modified summary, we identify the hallucinated span $s$ in an unfaithful summary by identifying where it differs from the seed, highlighted in red. Entailed prefixes are then derived from all positions up to $s$, while non-entailed prefixes are derived starting from the ending position of $s$.
  • Figure 4: F1 scores across prefix-length bins for MiniTruePrefixes and MiniTrue, with 95% confidence intervals.