Conditional Factuality Controlled LLMs with Generalization Certificates via Conformal Sampling

Kai Ye; Qingtao Pan; Shuo Li

Conditional Factuality Controlled LLMs with Generalization Certificates via Conformal Sampling

Kai Ye, Qingtao Pan, Shuo Li

Abstract

Large language models (LLMs) need reliable test-time control of hallucinations. Existing conformal methods for LLMs typically provide only \emph{marginal} guarantees and rely on a single global threshold, which can under-cover hard prompts, over-cover easy ones, and produce oversized prediction sets. We propose \emph{Conditional Factuality Control} (CFC), a post-hoc conformal framework that returns \emph{set-valued} outputs with \emph{conditional} coverage guarantees. CFC defines a continuous, feature-conditional acceptance threshold through augmented quantile regression on a latent ``success'' score, and deploys it through a fixed-point threshold rule at inference time. Theoretically, we show that CFC satisfies a conditional coverage guarantee under exchangeability and analyze its \emph{efficiency}, proving that, under mild assumptions on the score distributions, the conditional rule is strictly more sample-efficient than marginal conformal prediction at the same target coverage. We further derive a PAC-style variant, CFC-PAC, which shrinks the nominal risk level based on a stability bound, yielding a finite-sample certificate that the conditional miscoverage deviates from the target by at most $O(\sqrt{\log(1/δ)/N})$. Empirically, on synthetic data, real-world reasoning and QA benchmarks, and a Flickr8k VLM setting, CFC and CFC-PAC consistently attain near-target coverage across difficulty groups while using smaller prediction sets than CP and non-CP baselines.

Conditional Factuality Controlled LLMs with Generalization Certificates via Conformal Sampling

Abstract

Conditional Factuality Controlled LLMs with Generalization Certificates via Conformal Sampling

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)