LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models
Junyi Zhang, Jiaqi Guo, Shizhao Sun, Jian-Guang Lou, Dongmei Zhang
TL;DR
LayoutDiffusion reframes graphic layout generation as a discrete diffusion problem over heterogeneous token sequences, introducing a carefully designed mild forward process that preserves legality, leverages coordinate proximity, and mitigates type disruption. The forward process uses a block-wise transition matrix and a piecewise linear noise schedule, while a Transformer-driven reverse process learns $p_ heta( extbf{x}_0| extbf{x}_t)$ to iteratively refine layouts. This approach enables strong unconditional generation and plug-and-play conditional generation (refinement and Type-conditioned generation) without re-training, achieving state-of-the-art results on RICO and PubLayNet and demonstrating robustness to noise and improved diversity. The work advances the application of diffusion models to heterogeneous data and suggests broad potential for conditional tasks in layout design and related domains.
Abstract
Creating graphic layouts is a fundamental step in graphic designs. In this work, we present a novel generative model named LayoutDiffusion for automatic layout generation. As layout is typically represented as a sequence of discrete tokens, LayoutDiffusion models layout generation as a discrete denoising diffusion process. It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps and layouts in the neighboring steps do not differ too much. Designing such a mild forward process is however very challenging as layout has both categorical attributes and ordinal attributes. To tackle the challenge, we summarize three critical factors for achieving a mild forward process for the layout, i.e., legality, coordinate proximity and type disruption. Based on the factors, we propose a block-wise transition matrix coupled with a piece-wise linear noise schedule. Experiments on RICO and PubLayNet datasets show that LayoutDiffusion outperforms state-of-the-art approaches significantly. Moreover, it enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.
