Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

Yucheng Zhou; Hao Li; Jianbing Shen

Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

Yucheng Zhou, Hao Li, Jianbing Shen

TL;DR

The paper addresses condition errors and inconsistency in autoregressive image generation with diffusion loss by providing a theoretical comparison to conditional diffusion, showing that patch-based denoising stabilizes the condition and that autoregressive conditioning exponentially reduces its influence. It introduces an Optimal Transport–based condition refinement formulated as a Wasserstein gradient flow, with a JKO scheme and Sinkhorn iterations to guarantee convergence to the ideal condition distribution. The approach is validated on ImageNet, achieving state-of-the-art or competitive FID/IS across model scales and high resolutions, and exhibits robust condition refinement as demonstrated by denoising metrics. This work offers a principled framework for improving autoregressive diffusion methods, combining solid theory with scalable OT-based algorithms and strong empirical results.

Abstract

Recent studies have explored autoregressive models for image generation, with promising results, and have combined diffusion models with autoregressive frameworks to optimize image generation via diffusion losses. In this study, we present a theoretical analysis of diffusion and autoregressive models with diffusion loss, highlighting the latter's advantages. We present a theoretical comparison of conditional diffusion and autoregressive diffusion with diffusion loss, demonstrating that patch denoising optimization in autoregressive models effectively mitigates condition errors and leads to a stable condition distribution. Our analysis also reveals that autoregressive condition generation refines the condition, causing the condition error influence to decay exponentially. In addition, we introduce a novel condition refinement approach based on Optimal Transport (OT) theory to address ``condition inconsistency''. We theoretically demonstrate that formulating condition refinement as a Wasserstein Gradient Flow ensures convergence toward the ideal condition distribution, effectively mitigating condition inconsistency. Experiments demonstrate the superiority of our method over diffusion and autoregressive models with diffusion loss methods.

Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

TL;DR

Abstract

Paper Structure (61 sections, 15 theorems, 77 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 61 sections, 15 theorems, 77 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries
Diffusion Modeling.
Autoregressive Modeling.
Theoretical Analysis on Autoregressive Image Modeling with Diffusion Loss
Difference of Diffusion Models
Conditional Diffusion Modeling.
Autoregressive Modeling with Diffusion Loss.
Conditional Denoising Model Error Definition
Conditional Score Matching as an Upper Bound.
Error in Conditional Score Matching.
Conditional Control Term Analysis.
Condition Refinement through Patch Denoising.
Autoregressive Modeling Can Refine Condition
Autoregressive Condition Optimization
...and 46 more sections

Key Result

Theorem 1

The standard score matching loss is upper-bounded by the conditional score matching loss: See Appendix app:upper_bound_proof for the proof, which uses the law of total probability and Jensen's inequality.

Figures (3)

Figure 1: The autoregressive model predicts an initial condition, which is processed by the OT Refinement module using a sampled prior derived from Algorithm \ref{['alg:aco_denoise_full']}. The resulting refined condition then guides the Denoise MLP for latent generation.
Figure 2: Qualitative results on $256 \times 256$ ImageNet class-conditional generation. These images are generated by Ours.
Figure 3: Analysis of Signal-to-Noise Ratio (SNR, Left) and Noise Intensity ( Right) during the denoising process of our method and the baseline. All analyses are computed in the image space after VAE decoding.

Theorems & Definitions (24)

Theorem 1: Conditional Score Matching Upper Bound
Lemma 1: Expansion of Score Matching Loss
Definition 1: Conditional Error Term $\epsilon_c$
Definition 2: Simplified Conditional Error Term $\overline{\epsilon}_c$
Lemma 2: Uniqueness of Conditional Control Term
Proposition 1: Condition Refinement via Patch Denoising
Lemma 3: Markov Property meyn2012markov Bellet2006
Lemma 4: Regularity of Conditional Probability DU04
Lemma 5: Bounded Derivative Theorem
Theorem 2: Descent of Gradient Norm in Autoregressive Process
...and 14 more

Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

TL;DR

Abstract

Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (24)