Table of Contents
Fetching ...

Calibrating Generative Models

Henry D. Smith, Nathaniel L. Diamant, Brian L. Trippe

TL;DR

Calibrating Generative Models (CGM) tackles miscalibration in samples by casting calibration as a constrained KL minimization: minimize $D_{\mathrm{KL}}(p_{\theta}\|p_{\theta_{\text{base}}})$ subject to $\mathbb{E}_{p_{\theta}}[\mathbf{h}(\mathbf{x})]=\mathbf{h}^*$. It introduces two surrogate objectives, CGM-relax and CGM-reward, enabling tractable optimization with unbiased gradient estimators; CGM-relax uses a miscalibration penalty plus a KL penalty, while CGM-reward targets a maximum-entropy tilt $p_{\boldsymbol{\alpha}}$ and minimizes $D_{\mathrm{KL}}(p_{\theta}\|p_{\boldsymbol{\hat{\alpha}}_N})$. Across protein design, conditional image generation, and language tasks, CGM reduces calibration error across hundreds of constraints on models up to $10^9$ parameters with minimal degradation to sample quality. The work highlights residual challenges in rare-event calibration and notes the current framework's reliance on tractable likelihoods, pointing to future work to extend calibration to implicit models such as VAEs, GANs, and other non-likelihood-based frameworks.

Abstract

Generative models frequently suffer miscalibration, wherein class probabilities and other statistics of the sampling distribution deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly, we introduce two surrogate objectives for fine-tuning: (1) the relax loss, which replaces the constraint with a miscalibration penalty, and (2) the reward loss, which converts calibration into a reward fine-tuning problem. We demonstrate that these approaches substantially reduce calibration error across hundreds of simultaneous constraints and models with up to one billion parameters, spanning applications in protein design, image generation, and language modeling.

Calibrating Generative Models

TL;DR

Calibrating Generative Models (CGM) tackles miscalibration in samples by casting calibration as a constrained KL minimization: minimize subject to . It introduces two surrogate objectives, CGM-relax and CGM-reward, enabling tractable optimization with unbiased gradient estimators; CGM-relax uses a miscalibration penalty plus a KL penalty, while CGM-reward targets a maximum-entropy tilt and minimizes . Across protein design, conditional image generation, and language tasks, CGM reduces calibration error across hundreds of constraints on models up to parameters with minimal degradation to sample quality. The work highlights residual challenges in rare-event calibration and notes the current framework's reliance on tractable likelihoods, pointing to future work to extend calibration to implicit models such as VAEs, GANs, and other non-likelihood-based frameworks.

Abstract

Generative models frequently suffer miscalibration, wherein class probabilities and other statistics of the sampling distribution deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly, we introduce two surrogate objectives for fine-tuning: (1) the relax loss, which replaces the constraint with a miscalibration penalty, and (2) the reward loss, which converts calibration into a reward fine-tuning problem. We demonstrate that these approaches substantially reduce calibration error across hundreds of simultaneous constraints and models with up to one billion parameters, spanning applications in protein design, image generation, and language modeling.

Paper Structure

This paper contains 34 sections, 15 theorems, 97 equations, 10 figures, 1 table, 2 algorithms.

Key Result

Theorem 2.1

Under assumptions, there exists a unique solution to eq:prob-max-entropy that has the form

Figures (10)

  • Figure 1: Calibrating mixture proportions in a 1D GMM. A: The CGM-relax and CGM-reward solutions closely approximate the maximum entropy solution. B: (top) The CGM-relax regularization parameter $\lambda$ trades off between constraint satisfaction and closeness to the base model (bottom) CGM-reward is accurate when enough samples $N$ are used to estimate $\boldsymbol{\alpha}^\ast$.
  • Figure 2: A: CGM effectively upweights the probability of a rare mode in a 1D GMM. B: CGM-relax calibrates the base model to up to $10^3$ constraints, whereas CGM-reward is not well-defined for ${>}30$ constraints. When $\widehat{\boldsymbol{\alpha}}_N$ is fixed to $\boldsymbol{\alpha}^\ast$ (red dashed line), CGM-relax outperforms CGM-reward.
  • Figure 3: A: Samples from the Genie2 protein generative models before and after calibration with CGM-relax $(\lambda{=}10^{-3})$. B: CGM-relax reduces the distance of secondary structure content to natural proteins by ${>}4$ times for Genie2 and ${>}2$ times for ESM3 while maintaining biophysical plausibility.
  • Figure 4: Generations from the conditional TarFlow model zhainormalizing before and after calibration with CGM-relax $(\lambda = 10^{-4})$. CGM reweights the proportions of animals generated and produces realistic images. Some visual artifacts exist after calibration (see e.g., fox).
  • Figure 5: A: Gender imbalance and distance from base-model (symmetrized KL from pre-trained TinyStories-33M). B: Gender imbalance for professions included and heldout from calibration before and after CGM-relax ($\lambda = 0.1$). Points below the diagonal were improved by CGM.
  • ...and 5 more figures

Theorems & Definitions (27)

  • Theorem 2.1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • ...and 17 more