Table of Contents
Fetching ...

Improving Diffusion Generalization with Weak-to-Strong Segmented Guidance

Liangyu Yuan, Yufei Huang, Mingkun Lei, Tong Zhao, Ruoyu Wang, Changxi Chi, Yiwei Wang, Chi Zhang

Abstract

Diffusion models generate synthetic images through an iterative refinement process. However, the misalignment between the simulation-free objective and the iterative process often causes accumulated gradient error along the sampling trajectory, which leads to unsatisfactory results and a failure to generalize. Guidance techniques like Classifier Free Guidance (CFG) and AutoGuidance (AG) alleviate this by extrapolating between the main and inferior signal for stronger generalization. Despite empirical success, the effective operational regimes of prevalent guidance methods are still under-explored, leading to ambiguity when selecting the appropriate guidance method given a precondition. In this work, we first conduct synthetic comparisons to isolate and demonstrate the effective regime of guidance methods represented by CFG and AG from the perspective of weak-to-strong principle. Based on this, we propose a hybrid instantiation called SGG under the principle, taking the benefits of both. Furthermore, we demonstrate that the W2S principle along with SGG can be migrated into the training objective, improving the generalization ability of unguided diffusion models. We validate our approach with comprehensive experiments. At inference time, evaluations on SD3 and SD3.5 confirm that SGG outperforms existing training-free guidance variants. Training-time experiments on transformer architectures demonstrate the effective migration and performance gains in both conditional and unconditional settings. Code is available at https://github.com/851695e35/SGG.

Improving Diffusion Generalization with Weak-to-Strong Segmented Guidance

Abstract

Diffusion models generate synthetic images through an iterative refinement process. However, the misalignment between the simulation-free objective and the iterative process often causes accumulated gradient error along the sampling trajectory, which leads to unsatisfactory results and a failure to generalize. Guidance techniques like Classifier Free Guidance (CFG) and AutoGuidance (AG) alleviate this by extrapolating between the main and inferior signal for stronger generalization. Despite empirical success, the effective operational regimes of prevalent guidance methods are still under-explored, leading to ambiguity when selecting the appropriate guidance method given a precondition. In this work, we first conduct synthetic comparisons to isolate and demonstrate the effective regime of guidance methods represented by CFG and AG from the perspective of weak-to-strong principle. Based on this, we propose a hybrid instantiation called SGG under the principle, taking the benefits of both. Furthermore, we demonstrate that the W2S principle along with SGG can be migrated into the training objective, improving the generalization ability of unguided diffusion models. We validate our approach with comprehensive experiments. At inference time, evaluations on SD3 and SD3.5 confirm that SGG outperforms existing training-free guidance variants. Training-time experiments on transformer architectures demonstrate the effective migration and performance gains in both conditional and unconditional settings. Code is available at https://github.com/851695e35/SGG.
Paper Structure (35 sections, 39 equations, 12 figures, 8 tables)

This paper contains 35 sections, 39 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: $\mathrm{I}$: Weak-to-strong guidance principle: Guidance methods serve as tools for improving generalization capacity, we propose SGG to combine the benefits of condition-dependent (CDG) and condition-agnostic guidance (CAG). $\mathrm{II}$: Integration to the training framework, improving the generalization ability of unguided diffusion models.
  • Figure 2: Recursive toy example with varying class complexity and in-class distribution (granular of the condition). 1st row: In a well fitted model and the conditional information is blurry, CFG ho2021classifier exhibits mode-seeking capacity while lack diversity. 2nd row: In a less fitted model and the conditional information is sharp, AG karras2024guiding improves diversity while leads to outliers. 3rd row: In practice, SGG incorporates the mode-seeking capacity of CFG in high noise levels while applying AG in low noise levels to preserve the in-class distribution.
  • Figure 3: Applying guidance reduces the gap to optimal velocity $\dot{\mathbf{v}}$. The error-correction of CFG is prominent at high noise levels, while the effect of AG is prominent at low noise levels.
  • Figure 4: $\mathrm{I}$: Two groups of construction of the weak models, condition-dependent and condition-agnostic. $\mathrm{II}$: Segmented Guidance applied in training and sampling
  • Figure 5: Qualitative comparison between Conditional (w/o guidance), CFG ho2021classifier, SLG hyung2025spatiotemporal, SGG (Ours).
  • ...and 7 more figures