Table of Contents
Fetching ...

CoLay: Controllable Layout Generation through Multi-conditional Latent Diffusion

Chin-Yi Cheng, Ruiqi Gao, Forrest Huang, Yang Li

TL;DR

CoLay addresses the gap in controllable layout generation by enabling multi-conditional latent diffusion with four user-relevant condition types (text prompts, class/count, given design, and guidelines). It integrates a first-stage VAE, a multi-condition encoder, and a denoising network to produce layouts that include CSS style properties, evaluated on CLAY and C4 with novel metrics CycSim, C-Usage, and Design Distance. The approach demonstrates significant improvements in generation quality (FID) and alignment with conditions, while enabling flexible editing workflows and practical designer-centric interactions. The work contributes a scalable framework for complex layouts across UI and web domains, offering a more expressive and controllable design tool for practitioners.

Abstract

Layout design generation has recently gained significant attention due to its potential applications in various fields, including UI, graphic, and floor plan design. However, existing models face two main challenges that limits their adoption in practice. Firstly, the limited expressiveness of individual condition types used in previous works restricts designers' ability to convey complex design intentions and constraints. Secondly, most existing models focus on generating labels and coordinates, while real layouts contain a range of style properties. To address these limitations, we propose a novel framework, CoLay, that integrates multiple condition types and generates complex layouts with diverse style properties. Our approach outperforms prior works in terms of generation quality and condition satisfaction while empowering users to express their design intents using a flexible combination of modalities, including natural language prompts, layout guidelines, element types, and partially completed designs.

CoLay: Controllable Layout Generation through Multi-conditional Latent Diffusion

TL;DR

CoLay addresses the gap in controllable layout generation by enabling multi-conditional latent diffusion with four user-relevant condition types (text prompts, class/count, given design, and guidelines). It integrates a first-stage VAE, a multi-condition encoder, and a denoising network to produce layouts that include CSS style properties, evaluated on CLAY and C4 with novel metrics CycSim, C-Usage, and Design Distance. The approach demonstrates significant improvements in generation quality (FID) and alignment with conditions, while enabling flexible editing workflows and practical designer-centric interactions. The work contributes a scalable framework for complex layouts across UI and web domains, offering a more expressive and controllable design tool for practitioners.

Abstract

Layout design generation has recently gained significant attention due to its potential applications in various fields, including UI, graphic, and floor plan design. However, existing models face two main challenges that limits their adoption in practice. Firstly, the limited expressiveness of individual condition types used in previous works restricts designers' ability to convey complex design intentions and constraints. Secondly, most existing models focus on generating labels and coordinates, while real layouts contain a range of style properties. To address these limitations, we propose a novel framework, CoLay, that integrates multiple condition types and generates complex layouts with diverse style properties. Our approach outperforms prior works in terms of generation quality and condition satisfaction while empowering users to express their design intents using a flexible combination of modalities, including natural language prompts, layout guidelines, element types, and partially completed designs.
Paper Structure (26 sections, 2 equations, 9 figures, 10 tables)

This paper contains 26 sections, 2 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Multi-conditional layout generation. Left: UI layouts generated by CoLay trained on the CLAY dataset with four conditions. Right: webpage layouts generated by CoLay trained on the C4 dataset with two conditions.
  • Figure 2: Model architecture. (a) The VAE model is trained to convert the layouts between vector graphic and latent space, and the denoising network is trained on the encoded latent representations. (b) Each condition is encoded by a specific encoder and passed through a dropping mechanism. (c) During sampling, the denoising network will generate $\hat{z_0}$, and the decoder will decode it back to layout.
  • Figure 3: Multi-conditional generation and editing. We use a step-by-step example to illustrate the experience of creating and editing layouts with multiple conditions using CoLay.
  • Figure 4: Dataset statistics. (a) Number of element distribution for CLAY and C4, where the x-axis represents the number of elements in a layout and the y-axis is the corresponding probability. (b) Left: type distribution for CLAY; right: type distribution for C4.
  • Figure 5: Example UI layout paired with four types of generated UI summaries. From (a) to (d): generated summaries with different prompts. (e) Rendered input HTML. (f) Color legend.
  • ...and 4 more figures