Table of Contents
Fetching ...

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets

Zhen Liu, Tim Z. Xiao, Weiyang Liu, Yoshua Bengio, Dinghuai Zhang

TL;DR

This work tackles the problem of finetuning pretrained diffusion models to align with reward functions while preserving prior structure and sample diversity. It introduces Nabla-GFlowNet (∇-GFlowNet), a gradient-informed GFlowNet framework that leverages reward gradients through the gradient-informed Detailed Balance (∇-DB) objective. A residual variant (residual ∇-DB) couples finetuning with the pretrained prior to maintain natural image priors, and a forward-looking (FL) flow reparameterization speeds credit assignment for long sequences. Empirical results on Stable Diffusion across multiple reward models demonstrate faster convergence, improved diversity (DreamSim), and better prior preservation (lower FID) compared to gradient-free and gradient-aware baselines, highlighting its effectiveness for efficient, diverse reward-driven diffusion alignment.

Abstract

While one commonly trains large diffusion models by collecting datasets on target downstream tasks, it is often desired to align and finetune pretrained diffusion models with some reward functions that are either designed by experts or learned from small-scale datasets. Existing post-training methods for reward finetuning of diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. In response to this challenge, we take inspiration from recent successes in generative flow networks (GFlowNets) and propose a reinforcement learning method for diffusion model finetuning, dubbed Nabla-GFlowNet (abbreviated as $\nabla$-GFlowNet), that leverages the rich signal in reward gradients for probabilistic diffusion finetuning. We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model, on different realistic reward functions.

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets

TL;DR

This work tackles the problem of finetuning pretrained diffusion models to align with reward functions while preserving prior structure and sample diversity. It introduces Nabla-GFlowNet (∇-GFlowNet), a gradient-informed GFlowNet framework that leverages reward gradients through the gradient-informed Detailed Balance (∇-DB) objective. A residual variant (residual ∇-DB) couples finetuning with the pretrained prior to maintain natural image priors, and a forward-looking (FL) flow reparameterization speeds credit assignment for long sequences. Empirical results on Stable Diffusion across multiple reward models demonstrate faster convergence, improved diversity (DreamSim), and better prior preservation (lower FID) compared to gradient-free and gradient-aware baselines, highlighting its effectiveness for efficient, diverse reward-driven diffusion alignment.

Abstract

While one commonly trains large diffusion models by collecting datasets on target downstream tasks, it is often desired to align and finetune pretrained diffusion models with some reward functions that are either designed by experts or learned from small-scale datasets. Existing post-training methods for reward finetuning of diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. In response to this challenge, we take inspiration from recent successes in generative flow networks (GFlowNets) and propose a reinforcement learning method for diffusion model finetuning, dubbed Nabla-GFlowNet (abbreviated as -GFlowNet), that leverages the rich signal in reward gradients for probabilistic diffusion finetuning. We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model, on different realistic reward functions.

Paper Structure

This paper contains 36 sections, 3 theorems, 50 equations, 31 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

If $L_{\overrightarrow{\nabla}\text{DB}}(x_t, x_{t+1})= L_{\overleftarrow{\nabla}\text{DB}}(x_{t}, x_{t+1})=0$ for any denoising transition $(x_t, x_{t+1})$ over the state space and $L_{\nabla\text{DB-terminal}}(x_T)=0$ for all terminal state $x_T$, then the resulting forward policy generate samples

Figures (31)

  • Figure 1: Left: Illustration of the proposed residual $\nabla$-DB objective, along with its forward-looking variant. The two "forces" on each image in each transition $x_t \rightarrow x_{t+1}$ out of a trajectory $\tau = (x_0, x_1, ..., x_T)$ are expected to sum to zero. Green and blue terms represent forward and reverse residual policy scores (respectively), orange terms represent signals from terminal rewards and pink terms represent flow scores or residual flow scores, each of which is defined in Section \ref{['sec:method']}. Notice that the reward term on the $x_T$ in the final transition are different from the others. Right: Generated image from a model finetuned with the proposed residual $\nabla$-DB on the Aesthetic Score reward. The text prompt for each row is shown on the left. The leftmost figure is the image generated by the pretrained model while the rightmost one is from the model finetuned for 200 iterations.
  • Figure 2: Comparison between images generated by models finetuned with different methods for a maximum of $200$ update steps. For each method, we pick the model trained that produces images with the highest rewards without semantic collapse among all model checkpoints, as methods like ReFL and DRaFT-LV easily collapses (as illustrated in Fig. \ref{['fig:robustness']}). For each method, we show the average reward of the corresponding presented images.
  • Figure 3: Our $\nabla$-GFlowNet finetuning yields stable output compared to other baselines.
  • Figure 4: Qualitative results on HPSv2.
  • Figure 5: Qualitative results on ImageReward.
  • ...and 26 more figures

Theorems & Definitions (10)

  • Proposition 1
  • Remark 2
  • Remark 3
  • Proposition 4
  • Remark 5
  • Remark 6
  • proof
  • Theorem 7
  • Remark 8
  • Remark 9