Table of Contents
Fetching ...

ConfHit: Conformal Generative Design with Oracle Free Guarantees

Siddhartha Laghuvarapu, Ying Jin, Jimeng Sun

TL;DR

Across representative generative molecule design tasks and a broad range of methods, ConfHit consistently delivers valid coverage guarantees at multiple confidence levels while maintaining compact certified sets, establishing a principled and reliable framework for generative modeling.

Abstract

The success of deep generative models in scientific discovery requires not only the ability to generate novel candidates but also reliable guarantees that these candidates indeed satisfy desired properties. Recent conformal-prediction methods offer a path to such guarantees, but its application to generative modeling in drug discovery is limited by budget constraints, lack of oracle access, and distribution shift. To this end, we introduce ConfHit, a distribution-free framework that provides validity guarantees under these conditions. ConfHit formalizes two central questions: (i) Certification: whether a generated batch can be guaranteed to contain at least one hit with a user-specified confidence level, and (ii) Design: whether the generation can be refined to a compact set without weakening this guarantee. ConfHit leverages weighted exchangeability between historical and generated samples to eliminate the need for an experimental oracle, constructs multiple-sample density-ratio weighted conformal p-value to quantify statistical confidence in hits, and proposes a nested testing procedure to certify and refine candidate sets of multiple generated samples while maintaining statistical guarantees. Across representative generative molecule design tasks and a broad range of methods, ConfHit consistently delivers valid coverage guarantees at multiple confidence levels while maintaining compact certified sets, establishing a principled and reliable framework for generative modeling.

ConfHit: Conformal Generative Design with Oracle Free Guarantees

TL;DR

Across representative generative molecule design tasks and a broad range of methods, ConfHit consistently delivers valid coverage guarantees at multiple confidence levels while maintaining compact certified sets, establishing a principled and reliable framework for generative modeling.

Abstract

The success of deep generative models in scientific discovery requires not only the ability to generate novel candidates but also reliable guarantees that these candidates indeed satisfy desired properties. Recent conformal-prediction methods offer a path to such guarantees, but its application to generative modeling in drug discovery is limited by budget constraints, lack of oracle access, and distribution shift. To this end, we introduce ConfHit, a distribution-free framework that provides validity guarantees under these conditions. ConfHit formalizes two central questions: (i) Certification: whether a generated batch can be guaranteed to contain at least one hit with a user-specified confidence level, and (ii) Design: whether the generation can be refined to a compact set without weakening this guarantee. ConfHit leverages weighted exchangeability between historical and generated samples to eliminate the need for an experimental oracle, constructs multiple-sample density-ratio weighted conformal p-value to quantify statistical confidence in hits, and proposes a nested testing procedure to certify and refine candidate sets of multiple generated samples while maintaining statistical guarantees. Across representative generative molecule design tasks and a broad range of methods, ConfHit consistently delivers valid coverage guarantees at multiple confidence levels while maintaining compact certified sets, establishing a principled and reliable framework for generative modeling.
Paper Structure (68 sections, 6 theorems, 38 equations, 21 figures, 14 tables, 3 algorithms)

This paper contains 68 sections, 6 theorems, 38 equations, 21 figures, 14 tables, 3 algorithms.

Key Result

Theorem 3.1

Under the covariate shift assumption, it holds for any fixed $t\in [0,1]$ that Therefore, the certification test function $\psi(\mathcal{D}_{\text{calib}}, \{X_{n+j}\}_{j=1}^N) = \mathop{\mathrm{\mathds{1}}}\nolimits\{p^{{\textnormal{rand}}}_N\leq \alpha\}$ achieves eq:def_certify.

Figures (21)

  • Figure 1: (a) Problem setup: given an input, certify and generate a set of candidates that contains at least one "hit" (green) with probability at least $1-\alpha$. (b) ConfHit workflow. Given a nested sequence of candidate batches, we estimate the density ratio between labeled data and generated samples, compute a conformal p-value for each batch to quantify the confidence in it containing a hit, and return the smallest batch whose p-value falls below $\alpha$.
  • Figure 2: Certification results.Left: realized error rates at fixed $N$ for different models and error levels $\alpha$ in SBDD (upper) and CMO (lower). Middle: average error rates while varying budget $N$. Right: power, i.e., the fraction of actives certified at various error level $\alpha$ and budget $N$ values. The dashed line denotes the ideal $y=x$ error bound. Our method consistently achieves valid coverage across scenarios. Results are averaged over 5 random runs; error bars and additional results for other values of $N$ are in Appendix \ref{['app:test_stat']}
  • Figure 3: Design results. Error rate at fixed $N$ for different methods (left), mean set sizes averaged across methods at different values of $N$ (middle), and empty set percentage at different values of $N$ (right) across target levels $\alpha$. The top row shows results for SBDD and the bottom for CMO. Dashed black line in the error plots indicates the ideal $y=x$ bound. ConfHit achieves tight error control while producing substantially smaller sets. Results are averaged over 5 random runs; additional results are provided in the Appendix \ref{['app:test_stat']}.
  • Figure 4: Comparison of score statistics on TargetDiff (SBDD, N=5). Left: rejection (power) at $\alpha=0.1,0.3$. Right: error vs. $\alpha$. All remain valid; the max statistic shows the highest power.
  • Figure 5: (a) Distribution shift adjustment (left). Coverage violations when ConfHit is run without density correction. (b) Budget analysis (middle and right). Fraction of inputs with at least one hit under increasing generation budget. Solid lines: fraction of actual hits; dashed lines: predicted fraction of hits.
  • ...and 16 more figures

Theorems & Definitions (9)

  • Theorem 3.1
  • Remark 3.2: Outlier detection
  • Remark 3.3: Estimated density ratio
  • Theorem 3.4
  • Theorem 3.5: Robustness to estimation error
  • Theorem A.1
  • Lemma A.2
  • Remark A.3
  • Proposition A.4