Table of Contents
Fetching ...

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

Ye Tian, Ling Yang, Xinchen Zhang, Yunhai Tong, Mengdi Wang, Bin Cui

TL;DR

This paper tackles the challenge of aligning diffusion models with nuanced user preferences without incurring prohibitive inference costs. It introduces Diffusion-Sharpening, a trajectory-level fine-tuning framework that uses a path-integral reward mechanism to select high-quality denoising trajectories, along with two practical variants: SFT-Diffusion-Sharpening and RLHF-Diffusion-Sharpening. The approach integrates approximate $x_0$ estimation via PF-ODE and aggregates rewards over multiple trajectories, enabling efficient training and amortized inference. Empirical results show faster convergence during training and superior inference efficiency, with consistent outperforming of RL-based fine-tuning and sampling-trajectory methods across text alignment, compositionality, and human-preference metrics. The work provides a scalable, reward-agnostic pathway for diffusion-model fine-tuning with practical implications for deployment.

Abstract

We propose Diffusion-Sharpening, a fine-tuning approach that enhances downstream alignment by optimizing sampling trajectories. Existing RL-based fine-tuning methods focus on single training timesteps and neglect trajectory-level alignment, while recent sampling trajectory optimization methods incur significant inference NFE costs. Diffusion-Sharpening overcomes this by using a path integral framework to select optimal trajectories during training, leveraging reward feedback, and amortizing inference costs. Our method demonstrates superior training efficiency with faster convergence, and best inference efficiency without requiring additional NFEs. Extensive experiments show that Diffusion-Sharpening outperforms RL-based fine-tuning methods (e.g., Diffusion-DPO) and sampling trajectory optimization methods (e.g., Inference Scaling) across diverse metrics including text alignment, compositional capabilities, and human preferences, offering a scalable and efficient solution for future diffusion model fine-tuning. Code: https://github.com/Gen-Verse/Diffusion-Sharpening

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

TL;DR

This paper tackles the challenge of aligning diffusion models with nuanced user preferences without incurring prohibitive inference costs. It introduces Diffusion-Sharpening, a trajectory-level fine-tuning framework that uses a path-integral reward mechanism to select high-quality denoising trajectories, along with two practical variants: SFT-Diffusion-Sharpening and RLHF-Diffusion-Sharpening. The approach integrates approximate estimation via PF-ODE and aggregates rewards over multiple trajectories, enabling efficient training and amortized inference. Empirical results show faster convergence during training and superior inference efficiency, with consistent outperforming of RL-based fine-tuning and sampling-trajectory methods across text alignment, compositionality, and human-preference metrics. The work provides a scalable, reward-agnostic pathway for diffusion-model fine-tuning with practical implications for deployment.

Abstract

We propose Diffusion-Sharpening, a fine-tuning approach that enhances downstream alignment by optimizing sampling trajectories. Existing RL-based fine-tuning methods focus on single training timesteps and neglect trajectory-level alignment, while recent sampling trajectory optimization methods incur significant inference NFE costs. Diffusion-Sharpening overcomes this by using a path integral framework to select optimal trajectories during training, leveraging reward feedback, and amortizing inference costs. Our method demonstrates superior training efficiency with faster convergence, and best inference efficiency without requiring additional NFEs. Extensive experiments show that Diffusion-Sharpening outperforms RL-based fine-tuning methods (e.g., Diffusion-DPO) and sampling trajectory optimization methods (e.g., Inference Scaling) across diverse metrics including text alignment, compositional capabilities, and human preferences, offering a scalable and efficient solution for future diffusion model fine-tuning. Code: https://github.com/Gen-Verse/Diffusion-Sharpening

Paper Structure

This paper contains 38 sections, 11 equations, 10 figures, 3 tables, 2 algorithms.

Figures (10)

  • Figure 1: Comparison of Three Diffusion-Based Methods for Reward-Driven Optimization: (i) Diffusion Reinforcement Learning, (ii) Diffusion Sampling Trajectory Optimization, and (iii) Diffusion Sharpening.
  • Figure 2: Overview of Our Diffusion Sharpening Framework: (i) Training, (ii) Inference, and (iii) Reward Model Selection
  • Figure 3: Qualitative results comparing Diffusion Sharpening methods using different reward models. The images show the generated results with CLIP Score, Compositional Reward, MLLM, and Human Preferences as reward models, showcasing the effectiveness of SFT Diffusion Sharpening and RLHF Diffusion Sharpening in diffusion finetuning.
  • Figure 4: SDXL Finetuning Loss across Difference Datasets. Here "Diffusion-Sharpening" represents SFT Diffusion-Sharpening specifically.
  • Figure 5: Inference Performance of Diffusion Sharpening.
  • ...and 5 more figures