Table of Contents
Fetching ...

InnoAds-Composer: Efficient Condition Composition for E-Commerce Poster Generation

Yuxin Qin, Ke Cao, Haowei Liu, Ao Ma, Fengheng Li, Honghe Zhu, Zheng Zhang, Run Ling, Wei Feng, Xuanhua He, Zhanjie Zhang, Zhen Guo, Haoyi Bian, Jingjing Lv, Junjie Shen, Ching Law

TL;DR

This work proposes InnoAds-Composer, a single-stage framework that enables efficient tri-conditional control tokens over subject, glyph, and style and significantly outperforms existing product poster methods without obviously increasing inference latency.

Abstract

E-commerce product poster generation aims to automatically synthesize a single image that effectively conveys product information by presenting a subject, text, and a designed style. Recent diffusion models with fine-grained and efficient controllability have advanced product poster synthesis, yet they typically rely on multi-stage pipelines, and simultaneous control over subject, text, and style remains underexplored. Such naive multi-stage pipelines also show three issues: poor subject fidelity, inaccurate text, and inconsistent style. To address these issues, we propose InnoAds-Composer, a single-stage framework that enables efficient tri-conditional control tokens over subject, glyph, and style. To alleviate the quadratic overhead introduced by naive tri-conditional token concatenation, we perform importance analysis over layers and timesteps and route each condition only to the most responsive positions, thereby shortening the active token sequence. Besides, to improve the accuracy of Chinese text rendering, we design a Text Feature Enhancement Module (TFEM) that integrates features from both glyph images and glyph crops. To support training and evaluation, we also construct a high-quality e-commerce product poster dataset and benchmark, which is the first dataset that jointly contains subject, text, and style conditions. Extensive experiments demonstrate that InnoAds-Composer significantly outperforms existing product poster methods without obviously increasing inference latency.

InnoAds-Composer: Efficient Condition Composition for E-Commerce Poster Generation

TL;DR

This work proposes InnoAds-Composer, a single-stage framework that enables efficient tri-conditional control tokens over subject, glyph, and style and significantly outperforms existing product poster methods without obviously increasing inference latency.

Abstract

E-commerce product poster generation aims to automatically synthesize a single image that effectively conveys product information by presenting a subject, text, and a designed style. Recent diffusion models with fine-grained and efficient controllability have advanced product poster synthesis, yet they typically rely on multi-stage pipelines, and simultaneous control over subject, text, and style remains underexplored. Such naive multi-stage pipelines also show three issues: poor subject fidelity, inaccurate text, and inconsistent style. To address these issues, we propose InnoAds-Composer, a single-stage framework that enables efficient tri-conditional control tokens over subject, glyph, and style. To alleviate the quadratic overhead introduced by naive tri-conditional token concatenation, we perform importance analysis over layers and timesteps and route each condition only to the most responsive positions, thereby shortening the active token sequence. Besides, to improve the accuracy of Chinese text rendering, we design a Text Feature Enhancement Module (TFEM) that integrates features from both glyph images and glyph crops. To support training and evaluation, we also construct a high-quality e-commerce product poster dataset and benchmark, which is the first dataset that jointly contains subject, text, and style conditions. Extensive experiments demonstrate that InnoAds-Composer significantly outperforms existing product poster methods without obviously increasing inference latency.
Paper Structure (22 sections, 5 equations, 9 figures, 3 tables)

This paper contains 22 sections, 5 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Inno-Composer generates high-quality e-commerce posters under three independent controls—background style, subject appearance, and glyph text. Each row varies a single condition while keeping the other two fixed. The bottom-right inset shows the input of the varied condition.
  • Figure 2: Case Examples and Dataset Construction Pipeline for E-commerce Poster Generation.
  • Figure 3: Overview of InnoAds-Composer. The framework comprises three modules: (1) Multi-Condition Tokenization, which maps heterogeneous controls into a shared token space and aligns them with the MM-DiT backbone; (2) Importance-Aware Condition Injection, which routes each control to its importance layers to improve efficiency while preserving controllability; and (3) Decoupled Attention, which allows the main stream to attend to condition cues while the condition branch performs self-attention only, removing the extra path to reduce cost and maintain training–inference consistency.
  • Figure 4: The importance heatmaps of the three conditions across timesteps and layers.
  • Figure 5: Qualitative results. Left: Input conditions, including C1-style images, C2-glyph images, and C3-subject images. Right: Results generated by different methods.
  • ...and 4 more figures