Table of Contents
Fetching ...

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models

Parmida Atighehchian, Henry Wang, Andrei Kapustin, Boris Lerner, Tiancheng Jiang, Taylor Jensen, Negin Sokhandan

TL;DR

A new pipeline is presented that offers a fully automated, scalable solution for generating marketing images of commercial products using text-to-image models, and maintains the quality and fidelity of images, while also introducing sufficient creative variation to adhere to marketing guidelines.

Abstract

Text-to-image models have made significant strides, producing impressive results in generating images from textual descriptions. However, creating a scalable pipeline for deploying these models in production remains a challenge. Achieving the right balance between automation and human feedback is critical to maintain both scale and quality. While automation can handle large volumes, human oversight is still an essential component to ensure that the generated images meet the desired standards and are aligned with the creative vision. This paper presents a new pipeline that offers a fully automated, scalable solution for generating marketing images of commercial products using text-to-image models. The proposed system maintains the quality and fidelity of images, while also introducing sufficient creative variation to adhere to marketing guidelines. By streamlining this process, we ensure a seamless blend of efficiency and human oversight, achieving a $30.77\%$ increase in marketing object fidelity using DINOV2 and a $52.00\%$ increase in human preference over the generated outcome.

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models

TL;DR

A new pipeline is presented that offers a fully automated, scalable solution for generating marketing images of commercial products using text-to-image models, and maintains the quality and fidelity of images, while also introducing sufficient creative variation to adhere to marketing guidelines.

Abstract

Text-to-image models have made significant strides, producing impressive results in generating images from textual descriptions. However, creating a scalable pipeline for deploying these models in production remains a challenge. Achieving the right balance between automation and human feedback is critical to maintain both scale and quality. While automation can handle large volumes, human oversight is still an essential component to ensure that the generated images meet the desired standards and are aligned with the creative vision. This paper presents a new pipeline that offers a fully automated, scalable solution for generating marketing images of commercial products using text-to-image models. The proposed system maintains the quality and fidelity of images, while also introducing sufficient creative variation to adhere to marketing guidelines. By streamlining this process, we ensure a seamless blend of efficiency and human oversight, achieving a increase in marketing object fidelity using DINOV2 and a increase in human preference over the generated outcome.
Paper Structure (30 sections, 5 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 5 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Overview of our pipeline. The pipeline starts with decomposing the user query. It then retrieves relevant visual assets, and rewrites the initial prompts for optimized image generation quality. The visual assets are scaled and positioned at various positions of the canvas to maximize generation diversity. After generation, candidate images are filtered and ranked based on rubric scores and aesthetic scores.
  • Figure 2: Baseline vs. pipeline outputs across three models for multiple product references. Prompts span the full width below each image row.
  • Figure 3: Preference rates for Baseline (light blue) vs. Pipeline (dark blue) across different models.