Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond
Wenpin Tang, Fuzhong Zhou
TL;DR
The paper addresses reward-collapse and diversity issues in fine-tuning diffusion samplers by formulating entropy-regularized objectives and solving them via stochastic control.It derives a closed-form tilt of the pretrained distribution, develops a path-space control framework with Girsanov-based connections, and provides a Hamilton–Jacobi–Bellman solution to obtain optimal controls and initial distributions.Beyond entropy, the work extends to general $f$-divergence regularizers, deriving surrogate reward transformations and corresponding HJB equations to guide sampling and initialization.Numerical experiments on Stable Diffusion v1.5 show that Forward KL and $\gamma$-divergence outperform standard KL regularization, achieving higher aesthetics-reward with less drift and artifacts, especially at larger exploration levels.
Abstract
This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. (arXiv:2402.15194, 2024). The idea is to use stochastic control for sample generation, where the entropy regularizer is introduced to mitigate reward collapse. We also show how the analysis can be extended to fine-tuning with a general $f$-divergence regularizer. Numerical experiments on large-scale text-to-image models--Stable Diffusion v1.5 are conducted to validate our approach.
