LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi
TL;DR
LayoutDM introduces a discrete diffusion framework for controllable layout generation, using modality-specific diffusion, adaptive quantization, and decoupled positional encoding to model variable-length layouts. The model supports strong and weak inference-time constraints via masking and logit adjustment, enabling a single model to tackle diverse conditional tasks without retraining. Empirical results on Rico and PubLayNet demonstrate competitive or superior performance across six tasks, with ablations validating the importance of component choices and a favorable speed–quality trade-off. This approach provides a scalable, task-agnostic solution for structured layout generation with practical implications for UI and document design.
Abstract
Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input, where we model the layout corruption process by modality-wise discrete diffusion. For conditional generation, we propose to inject layout constraints in the form of masking or logit adjustment during inference. We show in the experiments that our LayoutDM successfully generates high-quality layouts and outperforms both task-specific and task-agnostic baselines on several layout tasks.
