Table of Contents
Fetching ...

Towards Reliable Advertising Image Generation Using Human Feedback

Zhenbang Du, Wei Feng, Haohan Wang, Yaoyu Li, Jingsen Wang, Jian Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junsheng Jin, Junjie Shen, Zhangang Lin, Jingping Shao

TL;DR

The work tackles the problem of low availability of reliable advertising images produced by automatic generation in e-commerce. It combines a multi-modal Reliable Feedback Network (RFNet) with a Recurrent Generation loop and a Consistent Condition regularization–based refinement (RFFT) to automate inspection and accelerate production while preserving visual quality. A large RF1M dataset with rich human annotations trains RFNet to mirror human judgments and guide diffusion-model fine-tuning without collapsing aesthetics. The approach yields higher available rates and more efficient production, enabling scalable, reliable AI-assisted advertising, with considerations for ethics and potential CTR-based future feedback.

Abstract

In the e-commerce realm, compelling advertising images are pivotal for attracting customer attention. While generative models automate image generation, they often produce substandard images that may mislead customers and require significant labor costs to inspect. This paper delves into increasing the rate of available generated images. We first introduce a multi-modal Reliable Feedback Network (RFNet) to automatically inspect the generated images. Combining the RFNet into a recurrent process, Recurrent Generation, results in a higher number of available advertising images. To further enhance production efficiency, we fine-tune diffusion models with an innovative Consistent Condition regularization utilizing the feedback from RFNet (RFFT). This results in a remarkable increase in the available rate of generated images, reducing the number of attempts in Recurrent Generation, and providing a highly efficient production process without sacrificing visual appeal. We also construct a Reliable Feedback 1 Million (RF1M) dataset which comprises over one million generated advertising images annotated by human, which helps to train RFNet to accurately assess the availability of generated images and faithfully reflect the human feedback. Generally speaking, our approach offers a reliable solution for advertising image generation.

Towards Reliable Advertising Image Generation Using Human Feedback

TL;DR

The work tackles the problem of low availability of reliable advertising images produced by automatic generation in e-commerce. It combines a multi-modal Reliable Feedback Network (RFNet) with a Recurrent Generation loop and a Consistent Condition regularization–based refinement (RFFT) to automate inspection and accelerate production while preserving visual quality. A large RF1M dataset with rich human annotations trains RFNet to mirror human judgments and guide diffusion-model fine-tuning without collapsing aesthetics. The approach yields higher available rates and more efficient production, enabling scalable, reliable AI-assisted advertising, with considerations for ethics and potential CTR-based future feedback.

Abstract

In the e-commerce realm, compelling advertising images are pivotal for attracting customer attention. While generative models automate image generation, they often produce substandard images that may mislead customers and require significant labor costs to inspect. This paper delves into increasing the rate of available generated images. We first introduce a multi-modal Reliable Feedback Network (RFNet) to automatically inspect the generated images. Combining the RFNet into a recurrent process, Recurrent Generation, results in a higher number of available advertising images. To further enhance production efficiency, we fine-tune diffusion models with an innovative Consistent Condition regularization utilizing the feedback from RFNet (RFFT). This results in a remarkable increase in the available rate of generated images, reducing the number of attempts in Recurrent Generation, and providing a highly efficient production process without sacrificing visual appeal. We also construct a Reliable Feedback 1 Million (RF1M) dataset which comprises over one million generated advertising images annotated by human, which helps to train RFNet to accurately assess the availability of generated images and faithfully reflect the human feedback. Generally speaking, our approach offers a reliable solution for advertising image generation.
Paper Structure (31 sections, 9 equations, 20 figures, 6 tables, 1 algorithm)

This paper contains 31 sections, 9 equations, 20 figures, 6 tables, 1 algorithm.

Figures (20)

  • Figure 1: The available generated advertising images and different types of bad cases. The products are highlighted by blue masks. Bad cases bring misleading information, e.g., the unrealistic sizes or shapes of products, and customers may have difficulty discerning the products in images.
  • Figure 2: Some examples in RF1M. Each comprises rich annotations. The translations of Chinese captions are in the brackets.
  • Figure 3: An overview of image generation-inspection pipeline. The advertising image is generated using product image and prompt by inpainting. And the feedback $F_{AC}$ provided by the RFNet is used to fine-tune the ControlNet with Consistent Condition regularization.
  • Figure 3: Availability evaluation ($\%$) of different approaches using one attempt RG.
  • Figure 4: The proposed RFNet. Multiple auxiliary modalities contribute to the final inspection. The translation of the Chinese caption is in the brackets.
  • ...and 15 more figures