Table of Contents
Fetching ...

Planning and Rendering: Towards Product Poster Generation with Diffusion Models

Zhaochen Li, Fengheng Li, Wei Feng, Honghe Zhu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao, Zhenglu Yang

TL;DR

This work tackles Product Poster Generation (PPG) by introducing a diffusion-based Planning and Rendering (P&R) framework that jointly plans layouts and renders coherent posters. PlanNet encodes product images and texts and performs discrete-diffusion layout planning over $K$ states in $T_P$ steps to produce flexible, content-aware layouts, while RenderNet fuses those layouts with the product appearance via a spatial fusion module and ControlNet conditioning to generate harmonious backgrounds, followed by a heuristic text-rendering module for final typography. A new large-scale dataset, PPG30k, supports training and evaluation with annotated posters, layouts, masks, and text. Empirical results show PlanNet improves layout rationality and diversity, and RenderNet yields more poster-like backgrounds compared to inpainting baselines and previous layout-to-image methods, validated by quantitative metrics and practitioner user studies. Overall, P&R enables efficient, versatile, and visually appealing product posters for e-commerce contexts, with potential for integrating advanced visual text generation in future work.

Abstract

Product poster generation significantly optimizes design efficiency and reduces production costs. Prevailing methods predominantly rely on image-inpainting methods to generate clean background images for given products. Subsequently, poster layout generation methods are employed to produce corresponding layout results. However, the background images may not be suitable for accommodating textual content due to their complexity, and the fixed location of products limits the diversity of layout results. To alleviate these issues, we propose a novel product poster generation framework based on diffusion models named P\&R. The P\&R draws inspiration from the workflow of designers in creating posters, which consists of two stages: Planning and Rendering. At the planning stage, we propose a PlanNet to generate the layout of the product and other visual components considering both the appearance features of the product and semantic features of the text, which improves the diversity and rationality of the layouts. At the rendering stage, we propose a RenderNet to generate the background for the product while considering the generated layout, where a spatial fusion module is introduced to fuse the layout of different visual components. To foster the advancement of this field, we propose the first product poster generation dataset PPG30k, comprising 30k exquisite product poster images along with comprehensive image and text annotations. Our method outperforms the state-of-the-art product poster generation methods on PPG30k. The PPG30k will be released soon.

Planning and Rendering: Towards Product Poster Generation with Diffusion Models

TL;DR

This work tackles Product Poster Generation (PPG) by introducing a diffusion-based Planning and Rendering (P&R) framework that jointly plans layouts and renders coherent posters. PlanNet encodes product images and texts and performs discrete-diffusion layout planning over states in steps to produce flexible, content-aware layouts, while RenderNet fuses those layouts with the product appearance via a spatial fusion module and ControlNet conditioning to generate harmonious backgrounds, followed by a heuristic text-rendering module for final typography. A new large-scale dataset, PPG30k, supports training and evaluation with annotated posters, layouts, masks, and text. Empirical results show PlanNet improves layout rationality and diversity, and RenderNet yields more poster-like backgrounds compared to inpainting baselines and previous layout-to-image methods, validated by quantitative metrics and practitioner user studies. Overall, P&R enables efficient, versatile, and visually appealing product posters for e-commerce contexts, with potential for integrating advanced visual text generation in future work.

Abstract

Product poster generation significantly optimizes design efficiency and reduces production costs. Prevailing methods predominantly rely on image-inpainting methods to generate clean background images for given products. Subsequently, poster layout generation methods are employed to produce corresponding layout results. However, the background images may not be suitable for accommodating textual content due to their complexity, and the fixed location of products limits the diversity of layout results. To alleviate these issues, we propose a novel product poster generation framework based on diffusion models named P\&R. The P\&R draws inspiration from the workflow of designers in creating posters, which consists of two stages: Planning and Rendering. At the planning stage, we propose a PlanNet to generate the layout of the product and other visual components considering both the appearance features of the product and semantic features of the text, which improves the diversity and rationality of the layouts. At the rendering stage, we propose a RenderNet to generate the background for the product while considering the generated layout, where a spatial fusion module is introduced to fuse the layout of different visual components. To foster the advancement of this field, we propose the first product poster generation dataset PPG30k, comprising 30k exquisite product poster images along with comprehensive image and text annotations. Our method outperforms the state-of-the-art product poster generation methods on PPG30k. The PPG30k will be released soon.
Paper Structure (16 sections, 11 equations, 11 figures, 6 tables)

This paper contains 16 sections, 11 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Two fields related to CPG: (a) Image-inpainting and (b) Poster layout generation. (c) The combination of (a) and (b) can be regarded as a naive solution to CPG. (d) The workflow of professional designers. "score" refers to the confidence of layout prediction, and the gray blocks are translations of texts. Green and red boxes represent texts and underlays, respectively.
  • Figure 2: The framework of P&R. It consists of a PlanNet and a RenderNet. The PlanNet aims to generate layouts based on the product image and texts, and the RenderNet aims to generate posters based on the product image and the generated layouts from PlanNet. The gray blocks contain texts and translations in the brackets.
  • Figure 3: The architecture of the layout decoder.
  • Figure 4: The architecture of the spatial fusion module.
  • Figure 5: Some visual examples in PPG30k.
  • ...and 6 more figures