Table of Contents
Fetching ...

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao, Zihang Lin, Chuanbin Liu, Min Zhou, Tiezheng Ge, Bo Zheng, Hongtao Xie

TL;DR

PosterMaker tackles end-to-end product poster generation by integrating TextRenderNet for precise multilingual text rendering and SceneGenNet for subject-preserved background generation within a Stable Diffusion 3 framework. It introduces a robust character-level visual representation as the control signal, formalized by the poster generation operator $I_g = f(I_s, M_s, T, P)$, where $I_g$ is the generated poster, $I_s$ the subject image, $M_s$ the subject mask, $T$ the text content/layout, and $P$ the scene prompt. A two-stage training strategy decouples text rendering from background learning, and subject fidelity is further improved via a foreground extension detector and subject fidelity feedback learning. Empirical results on PosterBenchmark show state-of-the-art text rendering accuracy and improved subject fidelity, validating the effectiveness of end-to-end poster synthesis with robust character-level text control.

Abstract

Product posters, which integrate subject, scene, and text, are crucial promotional tools for attracting customers. Creating such posters using modern image generation methods is valuable, while the main challenge lies in accurately rendering text, especially for complex writing systems like Chinese, which contains over 10,000 individual characters. In this work, we identify the key to precise text rendering as constructing a character-discriminative visual feature as a control signal. Based on this insight, we propose a robust character-wise representation as control and we develop TextRenderNet, which achieves a high text rendering accuracy of over 90%. Another challenge in poster generation is maintaining the fidelity of user-specific products. We address this by introducing SceneGenNet, an inpainting-based model, and propose subject fidelity feedback learning to further enhance fidelity. Based on TextRenderNet and SceneGenNet, we present PosterMaker, an end-to-end generation framework. To optimize PosterMaker efficiently, we implement a two-stage training strategy that decouples text rendering and background generation learning. Experimental results show that PosterMaker outperforms existing baselines by a remarkable margin, which demonstrates its effectiveness.

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

TL;DR

PosterMaker tackles end-to-end product poster generation by integrating TextRenderNet for precise multilingual text rendering and SceneGenNet for subject-preserved background generation within a Stable Diffusion 3 framework. It introduces a robust character-level visual representation as the control signal, formalized by the poster generation operator , where is the generated poster, the subject image, the subject mask, the text content/layout, and the scene prompt. A two-stage training strategy decouples text rendering from background learning, and subject fidelity is further improved via a foreground extension detector and subject fidelity feedback learning. Empirical results on PosterBenchmark show state-of-the-art text rendering accuracy and improved subject fidelity, validating the effectiveness of end-to-end poster synthesis with robust character-level text control.

Abstract

Product posters, which integrate subject, scene, and text, are crucial promotional tools for attracting customers. Creating such posters using modern image generation methods is valuable, while the main challenge lies in accurately rendering text, especially for complex writing systems like Chinese, which contains over 10,000 individual characters. In this work, we identify the key to precise text rendering as constructing a character-discriminative visual feature as a control signal. Based on this insight, we propose a robust character-wise representation as control and we develop TextRenderNet, which achieves a high text rendering accuracy of over 90%. Another challenge in poster generation is maintaining the fidelity of user-specific products. We address this by introducing SceneGenNet, an inpainting-based model, and propose subject fidelity feedback learning to further enhance fidelity. Based on TextRenderNet and SceneGenNet, we present PosterMaker, an end-to-end generation framework. To optimize PosterMaker efficiently, we implement a two-stage training strategy that decouples text rendering and background generation learning. Experimental results show that PosterMaker outperforms existing baselines by a remarkable margin, which demonstrates its effectiveness.

Paper Structure

This paper contains 28 sections, 4 equations, 17 figures, 10 tables.

Figures (17)

  • Figure 1: (a) Definition of the advertising product poster generation task. The input includes the prompt, subject image, and the texts to be rendered with their layouts. The output is the poster image. (b) The comparison of our method with the previous method. PosterMaker generates posters end-to-end, while previous methods first generate poster backgrounds and then render texts. (c) Visualization results demonstrate that PosterMaker can generate harmonious and aesthetically pleasing posters with accurate texts and maintain subject fidelity.
  • Figure 2: The illustration of the three challenges faced by poster generation, which seriously hinder the practical application.
  • Figure 3: The framework of the PosterMaker, which is based on the SD3. To precisely generate multilingual texts and create aesthetically pleasing poster scenes, TextRenderNet and SenceGenNet are introduced, whose outputs are used as control conditions added to the SD3.
  • Figure 4: The details of TextRenderNet and SceneGenNet, showcasing their model architectures and their interactions with SD3.
  • Figure 5: The distinction between the multilingual character-level text representation we proposed and the line-level methods of previous works like AnyText tuo2023anytext and GlyphDraw2 ma2024glyphdraw2.
  • ...and 12 more figures