Table of Contents
Fetching ...

LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models

Junyi Zhang, Jiaqi Guo, Shizhao Sun, Jian-Guang Lou, Dongmei Zhang

TL;DR

LayoutDiffusion reframes graphic layout generation as a discrete diffusion problem over heterogeneous token sequences, introducing a carefully designed mild forward process that preserves legality, leverages coordinate proximity, and mitigates type disruption. The forward process uses a block-wise transition matrix and a piecewise linear noise schedule, while a Transformer-driven reverse process learns $p_ heta( extbf{x}_0| extbf{x}_t)$ to iteratively refine layouts. This approach enables strong unconditional generation and plug-and-play conditional generation (refinement and Type-conditioned generation) without re-training, achieving state-of-the-art results on RICO and PubLayNet and demonstrating robustness to noise and improved diversity. The work advances the application of diffusion models to heterogeneous data and suggests broad potential for conditional tasks in layout design and related domains.

Abstract

Creating graphic layouts is a fundamental step in graphic designs. In this work, we present a novel generative model named LayoutDiffusion for automatic layout generation. As layout is typically represented as a sequence of discrete tokens, LayoutDiffusion models layout generation as a discrete denoising diffusion process. It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps and layouts in the neighboring steps do not differ too much. Designing such a mild forward process is however very challenging as layout has both categorical attributes and ordinal attributes. To tackle the challenge, we summarize three critical factors for achieving a mild forward process for the layout, i.e., legality, coordinate proximity and type disruption. Based on the factors, we propose a block-wise transition matrix coupled with a piece-wise linear noise schedule. Experiments on RICO and PubLayNet datasets show that LayoutDiffusion outperforms state-of-the-art approaches significantly. Moreover, it enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.

LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models

TL;DR

LayoutDiffusion reframes graphic layout generation as a discrete diffusion problem over heterogeneous token sequences, introducing a carefully designed mild forward process that preserves legality, leverages coordinate proximity, and mitigates type disruption. The forward process uses a block-wise transition matrix and a piecewise linear noise schedule, while a Transformer-driven reverse process learns to iteratively refine layouts. This approach enables strong unconditional generation and plug-and-play conditional generation (refinement and Type-conditioned generation) without re-training, achieving state-of-the-art results on RICO and PubLayNet and demonstrating robustness to noise and improved diversity. The work advances the application of diffusion models to heterogeneous data and suggests broad potential for conditional tasks in layout design and related domains.

Abstract

Creating graphic layouts is a fundamental step in graphic designs. In this work, we present a novel generative model named LayoutDiffusion for automatic layout generation. As layout is typically represented as a sequence of discrete tokens, LayoutDiffusion models layout generation as a discrete denoising diffusion process. It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps and layouts in the neighboring steps do not differ too much. Designing such a mild forward process is however very challenging as layout has both categorical attributes and ordinal attributes. To tackle the challenge, we summarize three critical factors for achieving a mild forward process for the layout, i.e., legality, coordinate proximity and type disruption. Based on the factors, we propose a block-wise transition matrix coupled with a piece-wise linear noise schedule. Experiments on RICO and PubLayNet datasets show that LayoutDiffusion outperforms state-of-the-art approaches significantly. Moreover, it enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.
Paper Structure (37 sections, 12 equations, 20 figures, 9 tables, 1 algorithm)

This paper contains 37 sections, 12 equations, 20 figures, 9 tables, 1 algorithm.

Figures (20)

  • Figure 1: Comparison of different forward corruption processes. We sample the layouts at the timesteps 0, 1/6, 2/6, 3/6, 4/6, 5/6, and 1 of the total timestep. The blank page is used when the format of the layout sequence is destroyed.
  • Figure 2: An illustration for LayoutDiffusion. In the forward process, the coordinates are mildly corrupted into stationary distribution, and the element types are absorbed into MASK in the late stage. In the reverse process, the element types are first recovered, and then the rough coordinates are gradually refined. For brevity, only two elements are shown, while the other elements and the special tokens are omitted.
  • Figure 3: Qualitative comparison against strongest baselines selected by FID (better view in color and 2$\times$ zoom). The first three row is for RICO and the last three is for PubLayNet. LayoutDiffusion generates high-quality and diverse layouts. Layouts from LayoutFormer++ either lack diversity (Un-Gen) or are flawed (Gen-Type and Refinement). Layouts from other methods misalign and overlap frequently.
  • Figure 4: Results of the user study. For each model, we count how many people prefer the layouts generated from this model. The study shows that the results generated by LayoutDiffusion were favored by users over the other methods, particularly in terms of diversity.
  • Figure 5: Reverse denoising process for unconditional generation on RICO (from left to right). Each row is for one model. The blank page is used when the generated layout sequence is invalid.
  • ...and 15 more figures