Table of Contents
Fetching ...

Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

Yucheng Zhou, Hao Li, Jianbing Shen

TL;DR

The paper addresses condition errors and inconsistency in autoregressive image generation with diffusion loss by providing a theoretical comparison to conditional diffusion, showing that patch-based denoising stabilizes the condition and that autoregressive conditioning exponentially reduces its influence. It introduces an Optimal Transport–based condition refinement formulated as a Wasserstein gradient flow, with a JKO scheme and Sinkhorn iterations to guarantee convergence to the ideal condition distribution. The approach is validated on ImageNet, achieving state-of-the-art or competitive FID/IS across model scales and high resolutions, and exhibits robust condition refinement as demonstrated by denoising metrics. This work offers a principled framework for improving autoregressive diffusion methods, combining solid theory with scalable OT-based algorithms and strong empirical results.

Abstract

Recent studies have explored autoregressive models for image generation, with promising results, and have combined diffusion models with autoregressive frameworks to optimize image generation via diffusion losses. In this study, we present a theoretical analysis of diffusion and autoregressive models with diffusion loss, highlighting the latter's advantages. We present a theoretical comparison of conditional diffusion and autoregressive diffusion with diffusion loss, demonstrating that patch denoising optimization in autoregressive models effectively mitigates condition errors and leads to a stable condition distribution. Our analysis also reveals that autoregressive condition generation refines the condition, causing the condition error influence to decay exponentially. In addition, we introduce a novel condition refinement approach based on Optimal Transport (OT) theory to address ``condition inconsistency''. We theoretically demonstrate that formulating condition refinement as a Wasserstein Gradient Flow ensures convergence toward the ideal condition distribution, effectively mitigating condition inconsistency. Experiments demonstrate the superiority of our method over diffusion and autoregressive models with diffusion loss methods.

Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

TL;DR

The paper addresses condition errors and inconsistency in autoregressive image generation with diffusion loss by providing a theoretical comparison to conditional diffusion, showing that patch-based denoising stabilizes the condition and that autoregressive conditioning exponentially reduces its influence. It introduces an Optimal Transport–based condition refinement formulated as a Wasserstein gradient flow, with a JKO scheme and Sinkhorn iterations to guarantee convergence to the ideal condition distribution. The approach is validated on ImageNet, achieving state-of-the-art or competitive FID/IS across model scales and high resolutions, and exhibits robust condition refinement as demonstrated by denoising metrics. This work offers a principled framework for improving autoregressive diffusion methods, combining solid theory with scalable OT-based algorithms and strong empirical results.

Abstract

Recent studies have explored autoregressive models for image generation, with promising results, and have combined diffusion models with autoregressive frameworks to optimize image generation via diffusion losses. In this study, we present a theoretical analysis of diffusion and autoregressive models with diffusion loss, highlighting the latter's advantages. We present a theoretical comparison of conditional diffusion and autoregressive diffusion with diffusion loss, demonstrating that patch denoising optimization in autoregressive models effectively mitigates condition errors and leads to a stable condition distribution. Our analysis also reveals that autoregressive condition generation refines the condition, causing the condition error influence to decay exponentially. In addition, we introduce a novel condition refinement approach based on Optimal Transport (OT) theory to address ``condition inconsistency''. We theoretically demonstrate that formulating condition refinement as a Wasserstein Gradient Flow ensures convergence toward the ideal condition distribution, effectively mitigating condition inconsistency. Experiments demonstrate the superiority of our method over diffusion and autoregressive models with diffusion loss methods.
Paper Structure (61 sections, 15 theorems, 77 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 61 sections, 15 theorems, 77 equations, 3 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

The standard score matching loss is upper-bounded by the conditional score matching loss: See Appendix app:upper_bound_proof for the proof, which uses the law of total probability and Jensen's inequality.

Figures (3)

  • Figure 1: The autoregressive model predicts an initial condition, which is processed by the OT Refinement module using a sampled prior derived from Algorithm \ref{['alg:aco_denoise_full']}. The resulting refined condition then guides the Denoise MLP for latent generation.
  • Figure 2: Qualitative results on $256 \times 256$ ImageNet class-conditional generation. These images are generated by Ours.
  • Figure 3: Analysis of Signal-to-Noise Ratio (SNR, Left) and Noise Intensity ( Right) during the denoising process of our method and the baseline. All analyses are computed in the image space after VAE decoding.

Theorems & Definitions (24)

  • Theorem 1: Conditional Score Matching Upper Bound
  • Lemma 1: Expansion of Score Matching Loss
  • Definition 1: Conditional Error Term $\epsilon_c$
  • Definition 2: Simplified Conditional Error Term $\overline{\epsilon}_c$
  • Lemma 2: Uniqueness of Conditional Control Term
  • Proposition 1: Condition Refinement via Patch Denoising
  • Lemma 3: Markov Property meyn2012markov Bellet2006
  • Lemma 4: Regularity of Conditional Probability DU04
  • Lemma 5: Bounded Derivative Theorem
  • Theorem 2: Descent of Gradient Norm in Autoregressive Process
  • ...and 14 more