Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

Jian Chen; Ruiyi Zhang; Yufan Zhou; Rajiv Jain; Zhiqiang Xu; Ryan Rossi; Changyou Chen

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

Jian Chen, Ruiyi Zhang, Yufan Zhou, Rajiv Jain, Zhiqiang Xu, Ryan Rossi, Changyou Chen

TL;DR

We address the problem of controllable layout generation with diffusion-based models that often struggle with alignment. We propose LACE, a unified continuous-diffusion framework that generates both geometric and categorical layout attributes, and incorporate differentiable aesthetic constraints for alignment and overlap, plus a time-dependent constraint weight and masking-based conditional generation. Empirical results on PubLayNet and Rico show state-of-the-art performance across unconditional, conditional, completion, and refinement tasks, with post-processing further enhancing alignment without degrading FID. The work advances practical, high-quality layout generation by combining continuous-space diffusion with explicit aesthetic constraints and a flexible conditioning mechanism, though it is limited to rectangular elements and a fixed label set. Future work could extend to arbitrary shapes and content-aware conditioning to broaden applicability to real-world design tasks.

Abstract

Controllable layout generation refers to the process of creating a plausible visual arrangement of elements within a graphic design (e.g., document and web designs) with constraints representing design intentions. Although recent diffusion-based models have achieved state-of-the-art FID scores, they tend to exhibit more pronounced misalignment compared to earlier transformer-based models. In this work, we propose the $\textbf{LA}$yout $\textbf{C}$onstraint diffusion mod$\textbf{E}$l (LACE), a unified model to handle a broad range of layout generation tasks, such as arranging elements with specified attributes and refining or completing a coarse layout design. The model is based on continuous diffusion models. Compared with existing methods that use discrete diffusion models, continuous state-space design can enable the incorporation of differentiable aesthetic constraint functions in training. For conditional generation, we introduce conditions via masked input. Extensive experiment results show that LACE produces high-quality layouts and outperforms existing state-of-the-art baselines.

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

TL;DR

Abstract

yout

onstraint diffusion mod

l (LACE), a unified model to handle a broad range of layout generation tasks, such as arranging elements with specified attributes and refining or completing a coarse layout design. The model is based on continuous diffusion models. Compared with existing methods that use discrete diffusion models, continuous state-space design can enable the incorporation of differentiable aesthetic constraint functions in training. For conditional generation, we introduce conditions via masked input. Extensive experiment results show that LACE produces high-quality layouts and outperforms existing state-of-the-art baselines.

Paper Structure (30 sections, 10 equations, 10 figures, 5 tables)

This paper contains 30 sections, 10 equations, 10 figures, 5 tables.

Introduction
Methodology
Preliminary: Continuous Diffusion Models
Continuous Layout Generation
Conditional Generation
Reconstruction and Aesthetic Constraints
Alignment constraint
Overlap constraint
Time-dependent constraint weight
Post-processing
Related Work
Layout Generation
Diffusion-based layout generation
Experiments
Experiment Setup
...and 15 more sections

Figures (10)

Figure 1: Comparisons of latent states in continuous and discrete diffusion for layout generation. Discrete diffusion adds elements to a blank canvas incrementally, and the added elements remain fixed. Continuous diffusion gradually refines a random layout into an organized one over time.
Figure 2: Overview of the layout generation model: The figure shows a layout generation of up to $15$ elements from $5$ classes. The layout is padded for consistency using padding elements, represented by an extra class in the $6$-dimensional one-hot vector. The bounding box uses a $4$-dimensional continuous vector. Dashed lines represent multi-step processes and solid lines represent a single step. The forward process corrupt data with Gaussian noise, while the reverse process trains a neural network to denoise the noisy latent $\mathbf{x}_t$ and its three augmentations using condition masks. Predicted bounding boxes is used to compute constraint loss against real ones.
Figure 3: Qualitative comparison between LACE and LayoutDM in conditional generation tasks.
Figure B.1: Time-dependent constraint weight and Mean Pairwise IoU in the forward process.
Figure B.2: Examples of convergence to local minimum with alignment and overlap constraints
...and 5 more figures

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

TL;DR

Abstract

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

Authors

TL;DR

Abstract

Table of Contents

Figures (10)