Table of Contents
Fetching ...

Relation-Aware Diffusion Model for Controllable Poster Layout Generation

Fengheng Li, An Liu, Wei Feng, Honghe Zhu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junjie Shen, Zhangang Lin, Jingping Shao

TL;DR

This work tackles poster layout generation by explicitly modeling the interplay between visual content, textual information, and geometric relations using a diffusion-based framework. It introduces two modules, VTRAM and GRAM, to align text with visual regions and to encode relative spatial relationships among elements, respectively, enabling controllable and content-aware layouts. A new dataset, CGL-Dataset V2, with text annotations, supports robust training and evaluation, and experiments show clear improvements over both content-aware and content-agnostic baselines in user studies and composition-quality metrics. The approach promises practical impact for automatic, high-quality poster design with user-guided controllability.

Abstract

Poster layout is a crucial aspect of poster design. Prior methods primarily focus on the correlation between visual content and graphic elements. However, a pleasant layout should also consider the relationship between visual and textual contents and the relationship between elements. In this study, we introduce a relation-aware diffusion model for poster layout generation that incorporates these two relationships in the generation process. Firstly, we devise a visual-textual relation-aware module that aligns the visual and textual representations across modalities, thereby enhancing the layout's efficacy in conveying textual information. Subsequently, we propose a geometry relation-aware module that learns the geometry relationship between elements by comprehensively considering contextual information. Additionally, the proposed method can generate diverse layouts based on user constraints. To advance research in this field, we have constructed a poster layout dataset named CGL-Dataset V2. Our proposed method outperforms state-of-the-art methods on CGL-Dataset V2. The data and code will be available at https://github.com/liuan0803/RADM.

Relation-Aware Diffusion Model for Controllable Poster Layout Generation

TL;DR

This work tackles poster layout generation by explicitly modeling the interplay between visual content, textual information, and geometric relations using a diffusion-based framework. It introduces two modules, VTRAM and GRAM, to align text with visual regions and to encode relative spatial relationships among elements, respectively, enabling controllable and content-aware layouts. A new dataset, CGL-Dataset V2, with text annotations, supports robust training and evaluation, and experiments show clear improvements over both content-aware and content-agnostic baselines in user studies and composition-quality metrics. The approach promises practical impact for automatic, high-quality poster design with user-guided controllability.

Abstract

Poster layout is a crucial aspect of poster design. Prior methods primarily focus on the correlation between visual content and graphic elements. However, a pleasant layout should also consider the relationship between visual and textual contents and the relationship between elements. In this study, we introduce a relation-aware diffusion model for poster layout generation that incorporates these two relationships in the generation process. Firstly, we devise a visual-textual relation-aware module that aligns the visual and textual representations across modalities, thereby enhancing the layout's efficacy in conveying textual information. Subsequently, we propose a geometry relation-aware module that learns the geometry relationship between elements by comprehensively considering contextual information. Additionally, the proposed method can generate diverse layouts based on user constraints. To advance research in this field, we have constructed a poster layout dataset named CGL-Dataset V2. Our proposed method outperforms state-of-the-art methods on CGL-Dataset V2. The data and code will be available at https://github.com/liuan0803/RADM.
Paper Structure (23 sections, 12 equations, 11 figures, 4 tables)

This paper contains 23 sections, 12 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: The visual examples of poster layout produced by CGL-GANZhou2022CompositionawareGL and ours.
  • Figure 2: (a) Poster layout annotation. Different colors represent different element types, the text annotation results are in the gray box, and the English translation is in brackets; (b) Clean image; (c) Input for inference stage.
  • Figure 3: The overview of our method, which contains four parts: feature extractor, VTRAM, GRAM and layout decoder.
  • Figure 4: Inspired by diffusion denoising process, from left to right, we formulate the poster layout generation as a process to gradually refine the position and size of boxes from step $T$ to step $i$.
  • Figure 5: The overview of the VTRAM. As illustrated in the figure, it takes as input text features, RoI features and corresponding coordinates. The coordinate information is first embedded into RoI features to get $V_{ip}$. Next, the scaled dot-product attentionVaswani2017AttentionIA is calculated using the visual position feature $V_{ip}$ as the query, and text features $L$ as the key and value.
  • ...and 6 more figures