Table of Contents
Fetching ...

DogLayout: Denoising Diffusion GAN for Discrete and Continuous Layout Generation

Zhaoxing Gan, Guangnan Ye

TL;DR

DogLayout introduces a diffusion-augmented GAN that enables discrete-label layout generation and dramatically speeds up sampling. By making all generator operations differentiable and ensuring the discriminator operates on denoised representations, it overcomes the discrete-data challenges that hinder traditional GANs while reducing diffusion timesteps for practical use. The approach extends layoutGAN-style generation to unconditional and completion tasks, achieving up to 175× faster sampling and reduced overlap (16.43→9.59) while maintaining or surpassing baselines on standard metrics. This work has practical implications for real-time and interactive layout design and suggests applicability to other discrete-structure synthesis problems via diffusion-assisted GANs.

Abstract

Layout Generation aims to synthesize plausible arrangements from given elements. Currently, the predominant methods in layout generation are Generative Adversarial Networks (GANs) and diffusion models, each presenting its own set of challenges. GANs typically struggle with handling discrete data due to their requirement for differentiable generated samples and have historically circumvented the direct generation of discrete labels by treating them as fixed conditions. Conversely, diffusion-based models, despite achieving state-of-the-art performance across several metrics, require extensive sampling steps which lead to significant time costs. To address these limitations, we propose \textbf{DogLayout} (\textbf{D}en\textbf{o}ising Diffusion \textbf{G}AN \textbf{Layout} model), which integrates a diffusion process into GANs to enable the generation of discrete label data and significantly reduce diffusion's sampling time. Experiments demonstrate that DogLayout considerably reduces sampling costs by up to 175 times and cuts overlap from 16.43 to 9.59 compared to existing diffusion models, while also surpassing GAN based and other layout methods. Code is available at https://github.com/deadsmither5/DogLayout.

DogLayout: Denoising Diffusion GAN for Discrete and Continuous Layout Generation

TL;DR

DogLayout introduces a diffusion-augmented GAN that enables discrete-label layout generation and dramatically speeds up sampling. By making all generator operations differentiable and ensuring the discriminator operates on denoised representations, it overcomes the discrete-data challenges that hinder traditional GANs while reducing diffusion timesteps for practical use. The approach extends layoutGAN-style generation to unconditional and completion tasks, achieving up to 175× faster sampling and reduced overlap (16.43→9.59) while maintaining or surpassing baselines on standard metrics. This work has practical implications for real-time and interactive layout design and suggests applicability to other discrete-structure synthesis problems via diffusion-assisted GANs.

Abstract

Layout Generation aims to synthesize plausible arrangements from given elements. Currently, the predominant methods in layout generation are Generative Adversarial Networks (GANs) and diffusion models, each presenting its own set of challenges. GANs typically struggle with handling discrete data due to their requirement for differentiable generated samples and have historically circumvented the direct generation of discrete labels by treating them as fixed conditions. Conversely, diffusion-based models, despite achieving state-of-the-art performance across several metrics, require extensive sampling steps which lead to significant time costs. To address these limitations, we propose \textbf{DogLayout} (\textbf{D}en\textbf{o}ising Diffusion \textbf{G}AN \textbf{Layout} model), which integrates a diffusion process into GANs to enable the generation of discrete label data and significantly reduce diffusion's sampling time. Experiments demonstrate that DogLayout considerably reduces sampling costs by up to 175 times and cuts overlap from 16.43 to 9.59 compared to existing diffusion models, while also surpassing GAN based and other layout methods. Code is available at https://github.com/deadsmither5/DogLayout.

Paper Structure

This paper contains 34 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Visualization of DogLayout's inference process. During inference, we first obtain the noisy layout from standard gaussian. Then the generator takes it as input to output the predicted clean layout. Subsequently, we derive the less noisy layout by adding noise to the predicted clean layout. Repeat the above process to achieve the final clean layout.
  • Figure 2: Overview of our method. (a) During training, we first obtain the noisy layout $x_{t-1}$, then generate $x_{t}$ by directly adding noise to $x_{t-1}$. The generator then takes $x_{t}$ and an additional latent dimension $z$ as inputs to output the predicted clean layout $x'_{0}$. Subsequently, we derive the predicted $x'_{t-1}$ using \ref{['eq:qxt-1xtx0']}. For the real data, the discriminator evaluates the real noisy layout $x_{t}$ and $x_{t-1}$ to determine whether $x_{t-1}$ is the true denoised layout of $x_{t}$. An additional decoder then takes the global context token $h$ from the discriminator and reconstructs $x_{0}$, which forces the discriminator to learn the meaningful attributes of the layout. For the fake data, the discriminator assesses the real noisy layout $x_{t}$ and the predicted layout $x'_{t-1}$ to determine whether $x'_{t-1}$ is the true denoised layout of $x_{t}$. The model architectures are shown in (b), (c) and (d).
  • Figure 3: Results of the user studies for Gen-Type(C→S+P and C+S→P), Completion and Uncond. For every selection, We place samples from multiple models and count how many people prefer the layouts generated from each model.
  • Figure 4: Qualitative comparison results on Rico and PubLyNet for four generation tasks with LayoutGAN++ and LayoutDM. Different colors represent different label classes and the decimals in parentheses are the values of width (w) and height (h).
  • Figure 5: Comparative analysis of discriminator’s ability to distinguish real and fake/generated data during training for LayoutGAN++ and DogLayout.