Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models
Yanbo Xu, Yu Wu, Sungjae Park, Zhizhuo Zhou, Shubham Tulsiani
TL;DR
Temporal Score Rescaling (TSR) provides a training-free mechanism to steer the sampling diversity of pre-trained diffusion and flow models by locally scaling the score ∇log p_t(x) with a time-dependent factor, enabling sharper or flatter local distributions without retraining. The authors derive an explicit rescaling function r_t(k, σ) and show it reduces to a simple linear relation for Gaussian mixtures, while empirically validating TSR on toy distributions and across domains such as image generation, depth estimation, pose prediction, protein design, and robotic manipulation. TSR is compatible with both deterministic and stochastic samplers and is orthogonal to classifier-free guidance, offering reliable improvements in precision for depth/pose tasks and enhanced diversity for image generation. While general theoretical guarantees are limited and global reweighting remains a challenge, TSR provides a practical, plug-and-play approach to control sampling temperature at inference and can be tuned per task to balance likelihood and diversity in real-world applications.
Abstract
We present a mechanism to steer the sampling diversity of denoising diffusion and flow matching models, allowing users to sample from a sharper or broader distribution than the training distribution. We build on the observation that these models leverage (learned) score functions of noisy data distributions for sampling and show that rescaling these allows one to effectively control a `local' sampling temperature. Notably, this approach does not require any finetuning or alterations to training strategy, and can be applied to any off-the-shelf model and is compatible with both deterministic and stochastic samplers. We first validate our framework on toy 2D data, and then demonstrate its application for diffusion models trained across five disparate tasks -- image generation, pose estimation, depth prediction, robot manipulation, and protein design. We find that across these tasks, our approach allows sampling from sharper (or flatter) distributions, yielding performance gains e.g., depth prediction models benefit from sampling more likely depth estimates, whereas image generation models perform better when sampling a slightly flatter distribution. Project page: https://temporalscorerescaling.github.io
