Table of Contents
Fetching ...

Language Models with Conformal Factuality Guarantees

Christopher Mohri, Tatsunori Hashimoto

TL;DR

This work addresses the challenge of ensuring factual correctness in open-ended language model outputs by introducing conformal factuality, which grounds LM outputs in entailment-based uncertainty sets and applies split conformal prediction to achieve high-probability correctness guarantees. The authors formalize the approach, prove theoretical guarantees, and instantiate F_t via sub-claims to enable a practical back-off mechanism compatible with black-box LMs. Empirical results on FActScore, Natural Questions, and MATH with GPT-4 demonstrate substantial gains in factuality while preserving most of the original content, illustrating a viable path toward safer LM deployment with probabilistic guarantees. The framework is model-agnostic, data-efficient, and adaptable to distribution shifts, marking a significant step toward probabilistic correctness in real-world LM applications.

Abstract

Guaranteeing the correctness and factuality of language model (LM) outputs is a major open problem. In this work, we propose conformal factuality, a framework that can ensure high probability correctness guarantees for LMs by connecting language modeling and conformal prediction. We observe that the correctness of an LM output is equivalent to an uncertainty quantification problem, where the uncertainty sets are defined as the entailment set of an LM's output. Using this connection, we show that conformal prediction in language models corresponds to a back-off algorithm that provides high probability correctness guarantees by progressively making LM outputs less specific (and expanding the associated uncertainty sets). This approach applies to any black-box LM and requires very few human-annotated samples. Evaluations of our approach on closed book QA (FActScore, NaturalQuestions) and reasoning tasks (MATH) show that our approach can provide 80-90% correctness guarantees while retaining the majority of the LM's original output.

Language Models with Conformal Factuality Guarantees

TL;DR

This work addresses the challenge of ensuring factual correctness in open-ended language model outputs by introducing conformal factuality, which grounds LM outputs in entailment-based uncertainty sets and applies split conformal prediction to achieve high-probability correctness guarantees. The authors formalize the approach, prove theoretical guarantees, and instantiate F_t via sub-claims to enable a practical back-off mechanism compatible with black-box LMs. Empirical results on FActScore, Natural Questions, and MATH with GPT-4 demonstrate substantial gains in factuality while preserving most of the original content, illustrating a viable path toward safer LM deployment with probabilistic guarantees. The framework is model-agnostic, data-efficient, and adaptable to distribution shifts, marking a significant step toward probabilistic correctness in real-world LM applications.

Abstract

Guaranteeing the correctness and factuality of language model (LM) outputs is a major open problem. In this work, we propose conformal factuality, a framework that can ensure high probability correctness guarantees for LMs by connecting language modeling and conformal prediction. We observe that the correctness of an LM output is equivalent to an uncertainty quantification problem, where the uncertainty sets are defined as the entailment set of an LM's output. Using this connection, we show that conformal prediction in language models corresponds to a back-off algorithm that provides high probability correctness guarantees by progressively making LM outputs less specific (and expanding the associated uncertainty sets). This approach applies to any black-box LM and requires very few human-annotated samples. Evaluations of our approach on closed book QA (FActScore, NaturalQuestions) and reasoning tasks (MATH) show that our approach can provide 80-90% correctness guarantees while retaining the majority of the LM's original output.
Paper Structure (20 sections, 3 theorems, 26 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 3 theorems, 26 equations, 8 figures, 5 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $\{X_i, Y_i^*\}_{i=1}^{n+1}$ be exchangeable, $\mathop{\mathrm{\mathsf{F}}}\nolimits_t$ be sound, and $\hat{q}_\alpha$ be defined as the $\frac{\lceil(n+1)(1-\alpha)\rceil}{n}$th quantile of the scores $\{r(X_i, Y^*_i)\}_{i=1}^n$, which we assume to be distinct without loss of generality. Then, If $\mathop{\mathrm{\mathsf{E}}}\nolimits(\mathop{\mathrm{\mathsf{F}}}\nolimits_t(\cdot))$ follows

Figures (8)

  • Figure 1: Conformal factuality uses conformal prediction to ensure the correctness of LM outputs. Each potential LM output sequence (top) is associated with an uncertainty set (bottom) that contains every 'more specific' statement that entails it. Conformal prediction provides probabilistic guarantees that these uncertainty sets contain a correct answer (blue), which in turn guarantees the correctness of the associated output.
  • Figure 2: Example $\{\mathop{\mathrm{\mathsf{F}}}\nolimits_t(x)\}_{t\in\mathcal{T}}$ via sub-claims. Here we identified three sub-claims corresponding to (1) Abe Lincoln's birthplace, (2) his notable job, and (3) what he was best known for.
  • Figure 3: Target vs. empirical factuality. Each solid line starts at the base factuality of GPT-4 on the associated dataset. NQ and MATH overlap on the top right.
  • Figure 4: Factuality vs. percent of sub-claims removed across all datasets. Frequency scoring (red) can lead to significant ($20-50\%$) gains in correctness while retaining the majority of claims when compared to the base GPT-4 model (star). The tick marks correspond to different values of target $\alpha$, and the standard deviations represent standard error.
  • Figure 5: Histogram of percent of sub-claims removed for frequency scoring and $\alpha=0.2$ on FActScore.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Theorem 4.1
  • proof
  • Proposition 5.0
  • proof
  • Corollary 5.0
  • proof