Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss
Yucheng Zhou, Hao Li, Jianbing Shen
TL;DR
The paper addresses condition errors and inconsistency in autoregressive image generation with diffusion loss by providing a theoretical comparison to conditional diffusion, showing that patch-based denoising stabilizes the condition and that autoregressive conditioning exponentially reduces its influence. It introduces an Optimal Transport–based condition refinement formulated as a Wasserstein gradient flow, with a JKO scheme and Sinkhorn iterations to guarantee convergence to the ideal condition distribution. The approach is validated on ImageNet, achieving state-of-the-art or competitive FID/IS across model scales and high resolutions, and exhibits robust condition refinement as demonstrated by denoising metrics. This work offers a principled framework for improving autoregressive diffusion methods, combining solid theory with scalable OT-based algorithms and strong empirical results.
Abstract
Recent studies have explored autoregressive models for image generation, with promising results, and have combined diffusion models with autoregressive frameworks to optimize image generation via diffusion losses. In this study, we present a theoretical analysis of diffusion and autoregressive models with diffusion loss, highlighting the latter's advantages. We present a theoretical comparison of conditional diffusion and autoregressive diffusion with diffusion loss, demonstrating that patch denoising optimization in autoregressive models effectively mitigates condition errors and leads to a stable condition distribution. Our analysis also reveals that autoregressive condition generation refines the condition, causing the condition error influence to decay exponentially. In addition, we introduce a novel condition refinement approach based on Optimal Transport (OT) theory to address ``condition inconsistency''. We theoretically demonstrate that formulating condition refinement as a Wasserstein Gradient Flow ensures convergence toward the ideal condition distribution, effectively mitigating condition inconsistency. Experiments demonstrate the superiority of our method over diffusion and autoregressive models with diffusion loss methods.
