Table of Contents
Fetching ...

An Evaluation Framework for Product Images Background Inpainting based on Human Feedback and Product Consistency

Yuqi Liang, Jun Luo, Xiaoxi Guo, Jianqi Bi

TL;DR

The paper tackles evaluating AI-based background inpainting for product images, where background appropriateness and product preservation are crucial yet poorly captured by existing metrics. It introduces HFPC, a dual-module framework with an image-referenced reward model based on BLIP to rate background quality and a product-consistency evaluator using segmentation (EfficientSAM) guided by GroundingDino to ensure products remain faithful after inpainting. A large HFPC-44k dataset (~44k image pairs) with human labels is built and used to train these components, including data-balancing across 25 product categories. Empirical results show state-of-the-art precision (96.4%) and substantial reductions in manual annotation needs, with ablations and visualizations clarifying the contribution of each module and the attention mechanisms. The work promises practical impact for e-commerce pipelines and opens up future work on online feedback and reinforcement-learning–based improvements to generative models.

Abstract

In product advertising applications, the automated inpainting of backgrounds utilizing AI techniques in product images has emerged as a significant task. However, the techniques still suffer from issues such as inappropriate background and inconsistent product in generated product images, and existing approaches for evaluating the quality of generated product images are mostly inconsistent with human feedback causing the evaluation for this task to depend on manual annotation. To relieve the issues above, this paper proposes Human Feedback and Product Consistency (HFPC), which can automatically assess the generated product images based on two modules. Firstly, to solve inappropriate backgrounds, human feedback on 44,000 automated inpainting product images is collected to train a reward model based on multi-modal features extracted from BLIP and comparative learning. Secondly, to filter generated product images containing inconsistent products, a fine-tuned segmentation model is employed to segment the product of the original and generated product images and then compare the differences between the above two. Extensive experiments have demonstrated that HFPC can effectively evaluate the quality of generated product images and significantly reduce the expense of manual annotation. Moreover, HFPC achieves state-of-the-art(96.4% in precision) in comparison to other open-source visual-quality-assessment models. Dataset and code are available at: https://github.com/created-Bi/background_inpainting_products_dataset

An Evaluation Framework for Product Images Background Inpainting based on Human Feedback and Product Consistency

TL;DR

The paper tackles evaluating AI-based background inpainting for product images, where background appropriateness and product preservation are crucial yet poorly captured by existing metrics. It introduces HFPC, a dual-module framework with an image-referenced reward model based on BLIP to rate background quality and a product-consistency evaluator using segmentation (EfficientSAM) guided by GroundingDino to ensure products remain faithful after inpainting. A large HFPC-44k dataset (~44k image pairs) with human labels is built and used to train these components, including data-balancing across 25 product categories. Empirical results show state-of-the-art precision (96.4%) and substantial reductions in manual annotation needs, with ablations and visualizations clarifying the contribution of each module and the attention mechanisms. The work promises practical impact for e-commerce pipelines and opens up future work on online feedback and reinforcement-learning–based improvements to generative models.

Abstract

In product advertising applications, the automated inpainting of backgrounds utilizing AI techniques in product images has emerged as a significant task. However, the techniques still suffer from issues such as inappropriate background and inconsistent product in generated product images, and existing approaches for evaluating the quality of generated product images are mostly inconsistent with human feedback causing the evaluation for this task to depend on manual annotation. To relieve the issues above, this paper proposes Human Feedback and Product Consistency (HFPC), which can automatically assess the generated product images based on two modules. Firstly, to solve inappropriate backgrounds, human feedback on 44,000 automated inpainting product images is collected to train a reward model based on multi-modal features extracted from BLIP and comparative learning. Secondly, to filter generated product images containing inconsistent products, a fine-tuned segmentation model is employed to segment the product of the original and generated product images and then compare the differences between the above two. Extensive experiments have demonstrated that HFPC can effectively evaluate the quality of generated product images and significantly reduce the expense of manual annotation. Moreover, HFPC achieves state-of-the-art(96.4% in precision) in comparison to other open-source visual-quality-assessment models. Dataset and code are available at: https://github.com/created-Bi/background_inpainting_products_dataset

Paper Structure

This paper contains 22 sections, 3 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Examples of original and AI-inpainting product images. Issues identified include (a-b) inappropriate background (pillows should not be placed on the ground), (c-d) inappropriate background(poor aesthetics), (e-f) product inconsistency between the original image and the generated image, (g-h) product inconsistency between the original image and the generated image.
  • Figure 2: The HFPC contains two modules working in parallel. The first module is a reward model based on the multimodal BLIP, which scores a pair of original and generated images reflecting the appropriateness of the background. The second module is a product consistency assessment model.
  • Figure 3: Reward model
  • Figure 4: Product consistency assessment model
  • Figure 5: Visualization of clustering of product images. The original image features were extracted from the BLIP encoder and clustered by KMeans and 25 categories were finally determined. Each color in the image represents one cluster, for example, the cluster with the highest number of products is the shoe category represented by blue, and the cluster with the lowest number is the cosmetics category (tubes) represented by grey.
  • ...and 4 more figures