Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu, Tim Z. Xiao, Weiyang Liu, Yoshua Bengio, Dinghuai Zhang
TL;DR
This work tackles the problem of finetuning pretrained diffusion models to align with reward functions while preserving prior structure and sample diversity. It introduces Nabla-GFlowNet (∇-GFlowNet), a gradient-informed GFlowNet framework that leverages reward gradients through the gradient-informed Detailed Balance (∇-DB) objective. A residual variant (residual ∇-DB) couples finetuning with the pretrained prior to maintain natural image priors, and a forward-looking (FL) flow reparameterization speeds credit assignment for long sequences. Empirical results on Stable Diffusion across multiple reward models demonstrate faster convergence, improved diversity (DreamSim), and better prior preservation (lower FID) compared to gradient-free and gradient-aware baselines, highlighting its effectiveness for efficient, diverse reward-driven diffusion alignment.
Abstract
While one commonly trains large diffusion models by collecting datasets on target downstream tasks, it is often desired to align and finetune pretrained diffusion models with some reward functions that are either designed by experts or learned from small-scale datasets. Existing post-training methods for reward finetuning of diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. In response to this challenge, we take inspiration from recent successes in generative flow networks (GFlowNets) and propose a reinforcement learning method for diffusion model finetuning, dubbed Nabla-GFlowNet (abbreviated as $\nabla$-GFlowNet), that leverages the rich signal in reward gradients for probabilistic diffusion finetuning. We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model, on different realistic reward functions.
