CoLay: Controllable Layout Generation through Multi-conditional Latent Diffusion
Chin-Yi Cheng, Ruiqi Gao, Forrest Huang, Yang Li
TL;DR
CoLay addresses the gap in controllable layout generation by enabling multi-conditional latent diffusion with four user-relevant condition types (text prompts, class/count, given design, and guidelines). It integrates a first-stage VAE, a multi-condition encoder, and a denoising network to produce layouts that include CSS style properties, evaluated on CLAY and C4 with novel metrics CycSim, C-Usage, and Design Distance. The approach demonstrates significant improvements in generation quality (FID) and alignment with conditions, while enabling flexible editing workflows and practical designer-centric interactions. The work contributes a scalable framework for complex layouts across UI and web domains, offering a more expressive and controllable design tool for practitioners.
Abstract
Layout design generation has recently gained significant attention due to its potential applications in various fields, including UI, graphic, and floor plan design. However, existing models face two main challenges that limits their adoption in practice. Firstly, the limited expressiveness of individual condition types used in previous works restricts designers' ability to convey complex design intentions and constraints. Secondly, most existing models focus on generating labels and coordinates, while real layouts contain a range of style properties. To address these limitations, we propose a novel framework, CoLay, that integrates multiple condition types and generates complex layouts with diverse style properties. Our approach outperforms prior works in terms of generation quality and condition satisfaction while empowering users to express their design intents using a flexible combination of modalities, including natural language prompts, layout guidelines, element types, and partially completed designs.
