Table of Contents
Fetching ...

LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

TL;DR

LayoutDM introduces a discrete diffusion framework for controllable layout generation, using modality-specific diffusion, adaptive quantization, and decoupled positional encoding to model variable-length layouts. The model supports strong and weak inference-time constraints via masking and logit adjustment, enabling a single model to tackle diverse conditional tasks without retraining. Empirical results on Rico and PubLayNet demonstrate competitive or superior performance across six tasks, with ablations validating the importance of component choices and a favorable speed–quality trade-off. This approach provides a scalable, task-agnostic solution for structured layout generation with practical implications for UI and document design.

Abstract

Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input, where we model the layout corruption process by modality-wise discrete diffusion. For conditional generation, we propose to inject layout constraints in the form of masking or logit adjustment during inference. We show in the experiments that our LayoutDM successfully generates high-quality layouts and outperforms both task-specific and task-agnostic baselines on several layout tasks.

LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

TL;DR

LayoutDM introduces a discrete diffusion framework for controllable layout generation, using modality-specific diffusion, adaptive quantization, and decoupled positional encoding to model variable-length layouts. The model supports strong and weak inference-time constraints via masking and logit adjustment, enabling a single model to tackle diverse conditional tasks without retraining. Empirical results on Rico and PubLayNet demonstrate competitive or superior performance across six tasks, with ablations validating the importance of component choices and a favorable speed–quality trade-off. This approach provides a scalable, task-agnostic solution for structured layout generation with practical implications for UI and document design.

Abstract

Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input, where we model the layout corruption process by modality-wise discrete diffusion. For conditional generation, we propose to inject layout constraints in the form of masking or logit adjustment during inference. We show in the experiments that our LayoutDM successfully generates high-quality layouts and outperforms both task-specific and task-agnostic baselines on several layout tasks.
Paper Structure (41 sections, 10 equations, 21 figures, 7 tables)

This paper contains 41 sections, 10 equations, 21 figures, 7 tables.

Figures (21)

  • Figure 1: Overview of LayoutDM. Top: LayoutDM is trained to gradually generate a complete layout from a blank state in discrete state space. Bottom: During sampling, we can steer LayoutDM to perform various conditional generation tasks without additional training or external models.
  • Figure 2: Overview of the corruption and denoising processes in LayoutDM. For simplicity, we use a toy layout consisting of two elements and the model generates three elements at maximum.
  • Figure 3: Comparison in conditional generation given partially known fields.
  • Figure 4: Qualitative comparison in the refinement task.
  • Figure 5: Quality-violation trade-off in the relationship task. Lower scores indicate better performance for both metrics.
  • ...and 16 more figures