Understanding temperature tuning in energy-based models
Peter W Fields, Vudtiwat Ngampruetikorn, David J Schwab, Stephanie E Palmer
TL;DR
Understanding temperature tuning in energy-based models addresses why post-hoc temperature adjustments improve generative outputs in sparse data regimes. The authors develop a physically motivated framework using forward and reversed KL divergences to quantify the fidelity-diversity trade-off and define an optimal sampling temperature. Using a simple toy model and a structured Ising landscape, they show that the optimal temperature is data- and landscape-dependent and may require raising or lowering tau. The work offers a diagnostic perspective for evaluating learned distributions and guiding robust training strategies in biological sequence design and related high-dimensional systems.
Abstract
Generative models of complex systems often require post-hoc parameter adjustments to produce useful outputs. For example, energy-based models for protein design are sampled at an artificially low ''temperature'' to generate novel, functional sequences. This temperature tuning is a common yet poorly understood heuristic used across machine learning contexts to control the trade-off between generative fidelity and diversity. Here, we develop an interpretable, physically motivated framework to explain this phenomenon. We demonstrate that in systems with a large ''energy gap'' - separating a small fraction of meaningful states from a vast space of unrealistic states - learning from sparse data causes models to systematically overestimate high-energy state probabilities, a bias that lowering the sampling temperature corrects. More generally, we characterize how the optimal sampling temperature depends on the interplay between data size and the system's underlying energy landscape. Crucially, our results show that lowering the sampling temperature is not always desirable; we identify the conditions where \emph{raising} it results in better generative performance. Our framework thus casts post-hoc temperature tuning as a diagnostic tool that reveals properties of the true data distribution and the limits of the learned model.
