Table of Contents
Fetching ...

Optimizing Few-Step Generation with Adaptive Matching Distillation

Lichen Bai, Zikai Zhou, Shitong Shao, Wenliang Zhong, Shuo Yang, Shuo Chen, Bojun Chen, Zeke Xie

TL;DR

This work identifies Forbidden Zones as regions where real-teacher guidance is unreliable and fake-teacher repulsion is insufficient, causing instability in Distribution Matching Distillation (DMD). It introduces Adaptive Matching Distillation (AMD), a reward-guided, self-correcting framework that detects these zones with a reward proxy, dynamically reweights per-sample gradient contributions through Dynamic Score Adaptation, and strengthens the repulsive landscape via Repulsive Landscape Sharpening. The authors reinterpret prior DMD variants within a unified optimization lens, showing they implicitly avoid corrupted regions, and demonstrate AMD's effectiveness across large-scale image and video generation tasks (e.g., SDXL, Wan2.1), achieving state-of-the-art-like performance on human-preference benchmarks. Empirical results show improved sample fidelity and training robustness, enabling efficient few-step generation and better alignment with human preferences, while outlining limitations and avenues for future refinement of reward-based diagnostics and adaptive interactions.

Abstract

Distribution Matching Distillation (DMD) is a powerful acceleration paradigm, yet its stability is often compromised in Forbidden Zone, regions where the real teacher provides unreliable guidance while the fake teacher exerts insufficient repulsive force. In this work, we propose a unified optimization framework that reinterprets prior art as implicit strategies to avoid these corrupted regions. Based on this insight, we introduce Adaptive Matching Distillation (AMD), a self-correcting mechanism that utilizes reward proxies to explicitly detect and escape Forbidden Zones. AMD dynamically prioritizes corrective gradients via structural signal decomposition and introduces Repulsive Landscape Sharpening to enforce steep energy barriers against failure mode collapse. Extensive experiments across image and video generation tasks (e.g., SDXL, Wan2.1) and rigorous benchmarks (e.g., VBench, GenEval) demonstrate that AMD significantly enhances sample fidelity and training robustness. For instance, AMD improves the HPSv2 score on SDXL from 30.64 to 31.25, outperforming state-of-the-art baselines. These findings validate that explicitly rectifying optimization trajectories within Forbidden Zones is essential for pushing the performance ceiling of few-step generative models.

Optimizing Few-Step Generation with Adaptive Matching Distillation

TL;DR

This work identifies Forbidden Zones as regions where real-teacher guidance is unreliable and fake-teacher repulsion is insufficient, causing instability in Distribution Matching Distillation (DMD). It introduces Adaptive Matching Distillation (AMD), a reward-guided, self-correcting framework that detects these zones with a reward proxy, dynamically reweights per-sample gradient contributions through Dynamic Score Adaptation, and strengthens the repulsive landscape via Repulsive Landscape Sharpening. The authors reinterpret prior DMD variants within a unified optimization lens, showing they implicitly avoid corrupted regions, and demonstrate AMD's effectiveness across large-scale image and video generation tasks (e.g., SDXL, Wan2.1), achieving state-of-the-art-like performance on human-preference benchmarks. Empirical results show improved sample fidelity and training robustness, enabling efficient few-step generation and better alignment with human preferences, while outlining limitations and avenues for future refinement of reward-based diagnostics and adaptive interactions.

Abstract

Distribution Matching Distillation (DMD) is a powerful acceleration paradigm, yet its stability is often compromised in Forbidden Zone, regions where the real teacher provides unreliable guidance while the fake teacher exerts insufficient repulsive force. In this work, we propose a unified optimization framework that reinterprets prior art as implicit strategies to avoid these corrupted regions. Based on this insight, we introduce Adaptive Matching Distillation (AMD), a self-correcting mechanism that utilizes reward proxies to explicitly detect and escape Forbidden Zones. AMD dynamically prioritizes corrective gradients via structural signal decomposition and introduces Repulsive Landscape Sharpening to enforce steep energy barriers against failure mode collapse. Extensive experiments across image and video generation tasks (e.g., SDXL, Wan2.1) and rigorous benchmarks (e.g., VBench, GenEval) demonstrate that AMD significantly enhances sample fidelity and training robustness. For instance, AMD improves the HPSv2 score on SDXL from 30.64 to 31.25, outperforming state-of-the-art baselines. These findings validate that explicitly rectifying optimization trajectories within Forbidden Zones is essential for pushing the performance ceiling of few-step generative models.
Paper Structure (71 sections, 4 theorems, 28 equations, 15 figures, 11 tables, 1 algorithm)

This paper contains 71 sections, 4 theorems, 28 equations, 15 figures, 11 tables, 1 algorithm.

Key Result

Proposition 3.1

(See Appendix sec:proof) Let $x = G_\theta(z)$ be a latent sample generated by the student. Under the first-order approximation, the parameter update $\theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}_{\mathrm{DMD}}$ induces an effective gradient descent step on the sample $x$ in the latent where $\mathcal{V}_{\mathrm{DMD}}(x)$ is the distillation potential defined as the contrastive ener

Figures (15)

  • Figure 1: Visualization of the Optimization Landscape in Forbidden Zones. (a) The Real Teacher's energy potential becomes undefined or poorly calibrated far from the data manifold, exerting misleading attractive forces. (b) The Fake Teacher's landscape exhibits a shallow slope, resulting in a weak repulsive force that is insufficient to propel the student out of the Forbidden Zone.
  • Figure 2: Visualizing the Taxonomy of Forbidden Zone Mitigation Strategies. (a) Standard Failure: Useful gradients exist only near modes, leaving a "gradient vacuum" in $\mathcal{Z}_{f}$. (b) External Force: An auxiliary force $\mathbf{F}_{\text{ext}}$ (green) provides global steering to bridge the gap. (c) Noise Reset: High noise induces distribution overlap, restoring gradient coverage. (d) Real Teacher Adaptation: The teacher manifold actively shifts towards the student to eliminate $\mathcal{Z}_{f}$.
  • Figure 3: Overview of AMD. The framework operates in three stages: (Left) Group Generation & Re-noising: For each prompt, the student $G_\theta$ produces a group of samples $\{x_i\}_{i=1}^K$, which are subsequently perturbed by the forward operator $\mathcal{F}_t$. (Middle) Reward-aware Diagnosis: A reward model $R(\cdot)$ serves as a proxy to pinpoint samples (e.g., $x_K$) trapped in the Forbidden Zone ($\mathcal{Z}_f$), where the real teacher's score $s_{\text{real}}$ becomes ill-posed. (Right) Dynamic Score Adaption: Through the adaptive operator $\mathcal{H}_{AMD}$, AMD rectifies the combination of $\mathbf{d}_{real}$ and $\mathbf{d}_{fake}$ to derive an optimized gradient direction, thereby facilitating a rapid escape from the Forbidden Zone and enabling the model to surpass the teacher under reward guidance.
  • Figure 4: Qualitative comparison on text-to-video generation. Compared to standard DMD (e.g., LongLive), AMD demonstrates superior visual fidelity and more coherent motion realism. Additional qualitative results are provided in Fig. \ref{['fig:supplement_video_case']}.
  • Figure 5: Component-wise ablation study on 50K-ImageNet (256$\times$256). (base model: SiT-XL/2). We investigate the contribution of each component in AMD. Dynamic Adapt denotes the Dynamic Score Adaptation (\ref{['sec:3.2']}), and Repulsive Sharpen denotes the Repulsive Landscape Sharpening (\ref{['sec:3.3']}).
  • ...and 10 more figures

Theorems & Definitions (8)

  • Proposition 3.1: Sample-space Gradient Descent Reformulation
  • Definition 3.2: Forbidden Zone $\mathcal{Z}_f$
  • proof
  • Proposition 3.1
  • proof
  • Proposition 3.2
  • proof
  • Proposition 3.3: External Force Domination