Table of Contents
Fetching ...

Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering

Klaus-Rudolf Kladny, Bernhard Schölkopf, Michael Muehlebach

TL;DR

This work addresses the lack of statistical guarantees in generative model outputs by introducing SCOPE-Gen, a sequential conformal prediction framework that combines an i.i.d. generation stage with greedy filtering stages. The key idea is to exploit a Markov-chain factorization of admissibility across three steps, enabling independent calibration with a 1D conformal prediction at each stage and reducing costly admissibility evaluations. Empirical results on natural language generation and molecular graph extension show substantial reductions in queries, time, and final set size compared to baselines such as CLM, while maintaining the desired admissibility level at $1-\alpha$. The approach has practical impact for safety-critical applications where human oracle checks are expensive, offering a scalable and provably reliable way to generate admissible outputs from black-box generative models.

Abstract

Generative models lack rigorous statistical guarantees for their outputs and are therefore unreliable in safety-critical applications. In this work, we propose Sequential Conformal Prediction for Generative Models (SCOPE-Gen), a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee called conformal admissibility control. This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example. To this end, our method first samples an initial set of i.i.d. examples from a black box generative model. Then, this set is iteratively pruned via so-called greedy filters. As a consequence of the iterative generation procedure, admissibility of the final prediction set factorizes as a Markov chain. This factorization is crucial, because it allows to control each factor separately, using conformal prediction. In comparison to prior work, our method demonstrates a large reduction in the number of admissibility evaluations during calibration. This reduction is important in safety-critical applications, where these evaluations must be conducted manually by domain experts and are therefore costly and time consuming. We highlight the advantages of our method in terms of admissibility evaluations and cardinality of the prediction sets through experiments in natural language generation and molecular graph extension tasks.

Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering

TL;DR

This work addresses the lack of statistical guarantees in generative model outputs by introducing SCOPE-Gen, a sequential conformal prediction framework that combines an i.i.d. generation stage with greedy filtering stages. The key idea is to exploit a Markov-chain factorization of admissibility across three steps, enabling independent calibration with a 1D conformal prediction at each stage and reducing costly admissibility evaluations. Empirical results on natural language generation and molecular graph extension show substantial reductions in queries, time, and final set size compared to baselines such as CLM, while maintaining the desired admissibility level at . The approach has practical impact for safety-critical applications where human oracle checks are expensive, offering a scalable and provably reliable way to generate admissible outputs from black-box generative models.

Abstract

Generative models lack rigorous statistical guarantees for their outputs and are therefore unreliable in safety-critical applications. In this work, we propose Sequential Conformal Prediction for Generative Models (SCOPE-Gen), a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee called conformal admissibility control. This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example. To this end, our method first samples an initial set of i.i.d. examples from a black box generative model. Then, this set is iteratively pruned via so-called greedy filters. As a consequence of the iterative generation procedure, admissibility of the final prediction set factorizes as a Markov chain. This factorization is crucial, because it allows to control each factor separately, using conformal prediction. In comparison to prior work, our method demonstrates a large reduction in the number of admissibility evaluations during calibration. This reduction is important in safety-critical applications, where these evaluations must be conducted manually by domain experts and are therefore costly and time consuming. We highlight the advantages of our method in terms of admissibility evaluations and cardinality of the prediction sets through experiments in natural language generation and molecular graph extension tasks.
Paper Structure (49 sections, 36 equations, 4 figures, 1 table, 3 algorithms)

This paper contains 49 sections, 36 equations, 4 figures, 1 table, 3 algorithms.

Figures (4)

  • Figure 1: SCOPE-Gen for Radiology Report Generation. The sequential prediction procedure can be separated into two stages: In the generation stage ($s=0$), i.i.d. text reports (blue tokens) are drawn from the generative model. In the filter stage $(s \in \{ 1, 2 \})$, the prediction set from the previous step $s-1$ is refined to remove examples with low quality (removing the false response in red) and examples with high similarity to already sampled texts (the answers in green are similar). Each step is performed via iterative (sub-)sampling in a given order (indicated by the purple circles), until the value of a non-conformity variable $\nu$ exceeds a calibrated threshold $\lambda_{(s)}^*$.
  • Figure 2: SCOPE-Gen Savings on Admissibility Checks. Samples in brackets denote samples that do not need to be assessed for admissibility (the ones sampled after the first admissible one). For the generation step (a), the $y_i^{(j)}$ are i.i.d. samples from the generative model $G$. For filter steps (b), the $y_i^{(j)}$ are examples from the previous prediction set $\mathcal{C}_{(s-1)}(x_i)$, ordered according to $\texttt{sub\_sample}$.
  • Figure 3: Admissibility Analysis for MIMIC-CXR. For all non-conformity scores, SCOPE-Gen (config. 1, [see \ref{['appx:alphas_choice']}]; blue) becomes less conservative with respect to the desired admissibility level $1-\alpha$ (red dashed line) as the amount of calibration samples $n$ increases, as typical for methods based on conformal prediction. For CLM (orange), in contrast, conservativeness depends much on the chosen non-conformity measure.
  • Figure : calibrate_generation

Theorems & Definitions (1)

  • proof