Table of Contents
Fetching ...

Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models

Yanbo Xu, Yu Wu, Sungjae Park, Zhizhuo Zhou, Shubham Tulsiani

TL;DR

Temporal Score Rescaling (TSR) provides a training-free mechanism to steer the sampling diversity of pre-trained diffusion and flow models by locally scaling the score ∇log p_t(x) with a time-dependent factor, enabling sharper or flatter local distributions without retraining. The authors derive an explicit rescaling function r_t(k, σ) and show it reduces to a simple linear relation for Gaussian mixtures, while empirically validating TSR on toy distributions and across domains such as image generation, depth estimation, pose prediction, protein design, and robotic manipulation. TSR is compatible with both deterministic and stochastic samplers and is orthogonal to classifier-free guidance, offering reliable improvements in precision for depth/pose tasks and enhanced diversity for image generation. While general theoretical guarantees are limited and global reweighting remains a challenge, TSR provides a practical, plug-and-play approach to control sampling temperature at inference and can be tuned per task to balance likelihood and diversity in real-world applications.

Abstract

We present a mechanism to steer the sampling diversity of denoising diffusion and flow matching models, allowing users to sample from a sharper or broader distribution than the training distribution. We build on the observation that these models leverage (learned) score functions of noisy data distributions for sampling and show that rescaling these allows one to effectively control a `local' sampling temperature. Notably, this approach does not require any finetuning or alterations to training strategy, and can be applied to any off-the-shelf model and is compatible with both deterministic and stochastic samplers. We first validate our framework on toy 2D data, and then demonstrate its application for diffusion models trained across five disparate tasks -- image generation, pose estimation, depth prediction, robot manipulation, and protein design. We find that across these tasks, our approach allows sampling from sharper (or flatter) distributions, yielding performance gains e.g., depth prediction models benefit from sampling more likely depth estimates, whereas image generation models perform better when sampling a slightly flatter distribution. Project page: https://temporalscorerescaling.github.io

Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models

TL;DR

Temporal Score Rescaling (TSR) provides a training-free mechanism to steer the sampling diversity of pre-trained diffusion and flow models by locally scaling the score ∇log p_t(x) with a time-dependent factor, enabling sharper or flatter local distributions without retraining. The authors derive an explicit rescaling function r_t(k, σ) and show it reduces to a simple linear relation for Gaussian mixtures, while empirically validating TSR on toy distributions and across domains such as image generation, depth estimation, pose prediction, protein design, and robotic manipulation. TSR is compatible with both deterministic and stochastic samplers and is orthogonal to classifier-free guidance, offering reliable improvements in precision for depth/pose tasks and enhanced diversity for image generation. While general theoretical guarantees are limited and global reweighting remains a challenge, TSR provides a practical, plug-and-play approach to control sampling temperature at inference and can be tuned per task to balance likelihood and diversity in real-world applications.

Abstract

We present a mechanism to steer the sampling diversity of denoising diffusion and flow matching models, allowing users to sample from a sharper or broader distribution than the training distribution. We build on the observation that these models leverage (learned) score functions of noisy data distributions for sampling and show that rescaling these allows one to effectively control a `local' sampling temperature. Notably, this approach does not require any finetuning or alterations to training strategy, and can be applied to any off-the-shelf model and is compatible with both deterministic and stochastic samplers. We first validate our framework on toy 2D data, and then demonstrate its application for diffusion models trained across five disparate tasks -- image generation, pose estimation, depth prediction, robot manipulation, and protein design. We find that across these tasks, our approach allows sampling from sharper (or flatter) distributions, yielding performance gains e.g., depth prediction models benefit from sampling more likely depth estimates, whereas image generation models perform better when sampling a slightly flatter distribution. Project page: https://temporalscorerescaling.github.io

Paper Structure

This paper contains 30 sections, 7 theorems, 57 equations, 16 figures, 4 tables.

Key Result

Theorem B.2

For $Error(t)$, there exists two upper bounds:

Figures (16)

  • Figure 1: Temporal Score Rescaling (TSR) provides a mechanism to steer the sampling diversity of diffusion and flow models at inference. Top-left: Probability density evolution when sampling a 1D Gaussian mixture with DDPM, and the effects of TSR , which can control the sampling process to yield sharper or flatter distributions. Top-right, bottom:TSR can be applied to any pre-trained diffusion or flow model, improving performance across diverse domains such as pose prediction, depth estimation, and image generation.
  • Figure 2: Comparison on Uniform Mixture of 1D Isotropic Gaussians. The uniform mixture of Gaussians distribution is divided into two classes (subplot 1). We apply CFG, CNS, and TSR to scale the conditional distribution of Class 1 (subplot 2). CFG and CNS lead non-uniform weights and tend to lose modes, while TSR preserve all modes and effectively reduce the variance of the samples.
  • Figure 2: Pose Prediction. Mean error (deg) and accuracy within thresholds 0.2, 0.5, 1. $(k, \sigma) = (7.0, 0.5)$ for TSR, $k=1600$ for CNS.
  • Figure 3: Left: Comparison on 2D Checkerboard and Swiss Roll Distributions. We compare samples from CNS and TSR. While CNS biases sampling towards the central modes and drops peripheral ones, TSR preserves all modes while reducing variance without generating divergent samples. Right: Effect of Hyperparameters on the Rescaling Factor. In the rightmost column, we plot the TSR rescaling factor $r_t$ on y-axis against diffusion time $t$. With $\sigma=1.0$, varying $k$ controls the asymptotic value of $r_t$ (top); with $k=2.0$, varying $\sigma$ determines how early rescaling takes effect during sampling (bottom).
  • Figure 4: Qualitative Examples for Varying $k$.TSR allows for tuning the generated outputs to be more diverse and detailed (lower k) or more smooth and likely (higher k). While neither extreme is desirable, we notice a $k$ slightly smaller than 1 gives pleasing images with enhanced details.
  • ...and 11 more figures

Theorems & Definitions (14)

  • Definition B.1: Error in TSR Score Approximation
  • Theorem B.2: Upper Bound of the Error
  • Theorem B.3: Vanishing Behavior of Error
  • Lemma B.4
  • Lemma B.5
  • Lemma B.6
  • proof : Proof of Theorem \ref{['thm:bound']}
  • proof : Proof of Theorem \ref{['thm:main']}
  • proof : Proof of Lemma \ref{['lem: bound 1']}
  • proof : Proof of Lemma \ref{['lem: exp bound']}
  • ...and 4 more