Table of Contents
Fetching ...

Desigen: A Pipeline for Controllable Design Template Generation

Haohan Weng, Danqing Huang, Yu Qiao, Zheng Hu, Chin-Yew Lin, Tong Zhang, C. L. Philip Chen

TL;DR

Desigen, an automatic template creation pipeline which generates background images as well as harmonious layout elements over the background as well as an iterative inference strategy to adjust the synthesized background and layout in multiple rounds is presented.

Abstract

Templates serve as a good starting point to implement a design (e.g., banner, slide) but it takes great effort from designers to manually create. In this paper, we present Desigen, an automatic template creation pipeline which generates background images as well as harmonious layout elements over the background. Different from natural images, a background image should preserve enough non-salient space for the overlaying layout elements. To equip existing advanced diffusion-based models with stronger spatial control, we propose two simple but effective techniques to constrain the saliency distribution and reduce the attention weight in desired regions during the background generation process. Then conditioned on the background, we synthesize the layout with a Transformer-based autoregressive generator. To achieve a more harmonious composition, we propose an iterative inference strategy to adjust the synthesized background and layout in multiple rounds. We constructed a design dataset with more than 40k advertisement banners to verify our approach. Extensive experiments demonstrate that the proposed pipeline generates high-quality templates comparable to human designers. More than a single-page design, we further show an application of presentation generation that outputs a set of theme-consistent slides. The data and code are available at https://whaohan.github.io/desigen.

Desigen: A Pipeline for Controllable Design Template Generation

TL;DR

Desigen, an automatic template creation pipeline which generates background images as well as harmonious layout elements over the background as well as an iterative inference strategy to adjust the synthesized background and layout in multiple rounds is presented.

Abstract

Templates serve as a good starting point to implement a design (e.g., banner, slide) but it takes great effort from designers to manually create. In this paper, we present Desigen, an automatic template creation pipeline which generates background images as well as harmonious layout elements over the background. Different from natural images, a background image should preserve enough non-salient space for the overlaying layout elements. To equip existing advanced diffusion-based models with stronger spatial control, we propose two simple but effective techniques to constrain the saliency distribution and reduce the attention weight in desired regions during the background generation process. Then conditioned on the background, we synthesize the layout with a Transformer-based autoregressive generator. To achieve a more harmonious composition, we propose an iterative inference strategy to adjust the synthesized background and layout in multiple rounds. We constructed a design dataset with more than 40k advertisement banners to verify our approach. Extensive experiments demonstrate that the proposed pipeline generates high-quality templates comparable to human designers. More than a single-page design, we further show an application of presentation generation that outputs a set of theme-consistent slides. The data and code are available at https://whaohan.github.io/desigen.
Paper Structure (17 sections, 6 equations, 17 figures, 5 tables)

This paper contains 17 sections, 6 equations, 17 figures, 5 tables.

Figures (17)

  • Figure 1: Design templates (background image and layout elements) generated by Desigen with in-the-wild prompts and layout specification. Our proposed pipeline flexibly supports synthesizing the templates from scratch (1st and 2nd columns) or conditioned on a fixed layout (3rd and 4th columns).
  • Figure 2: The process of design template generation. Desigen first synthesizes the background using a text description. Layout mask is an optional condition to specify regions that should be preserved with non-salient space. Layout is then generated given the background and the specification (type and quantity of layout elements).
  • Figure 3: Overview of Desigen. (a) background generator synthesizes background images from text descriptions; (b) layout generator creates layouts conditioned on the given backgrounds. By attention reduction, the synthesized backgrounds can be further refined based on input/layout masks for a more harmonious composition.
  • Figure 4: The cosine similarity between cross-attention maps and saliency maps. It shows that the attention maps are highly correlated with the corresponding saliency maps.
  • Figure 5: Generated backgrounds given prompts in the test dataset. Compared with baselines, our model generates backgrounds with more space preserved, approaching the real designs.
  • ...and 12 more figures