From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models

Parmida Atighehchian; Henry Wang; Andrei Kapustin; Boris Lerner; Tiancheng Jiang; Taylor Jensen; Negin Sokhandan

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models

Parmida Atighehchian, Henry Wang, Andrei Kapustin, Boris Lerner, Tiancheng Jiang, Taylor Jensen, Negin Sokhandan

TL;DR

A new pipeline is presented that offers a fully automated, scalable solution for generating marketing images of commercial products using text-to-image models, and maintains the quality and fidelity of images, while also introducing sufficient creative variation to adhere to marketing guidelines.

Abstract

Text-to-image models have made significant strides, producing impressive results in generating images from textual descriptions. However, creating a scalable pipeline for deploying these models in production remains a challenge. Achieving the right balance between automation and human feedback is critical to maintain both scale and quality. While automation can handle large volumes, human oversight is still an essential component to ensure that the generated images meet the desired standards and are aligned with the creative vision. This paper presents a new pipeline that offers a fully automated, scalable solution for generating marketing images of commercial products using text-to-image models. The proposed system maintains the quality and fidelity of images, while also introducing sufficient creative variation to adhere to marketing guidelines. By streamlining this process, we ensure a seamless blend of efficiency and human oversight, achieving a $30.77\%$ increase in marketing object fidelity using DINOV2 and a $52.00\%$ increase in human preference over the generated outcome.

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models

TL;DR

Abstract

increase in marketing object fidelity using DINOV2 and a

increase in human preference over the generated outcome.

Paper Structure (30 sections, 5 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 5 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Related Work
Fine-tuned Object Insertion and Harmonization
Training-free Insertion
Outpainting
Method
System Overview
Structured Prompt Decomposition
Multi-modal Asset Retrieval with Guardrails
Product Asset Retrieval
Background Asset Retrieval with LLM Validation
Caption Generator
Intelligent Composition Planning
Multi-modal Composition Analysis
Variation Generation Strategy
...and 15 more sections

Figures (3)

Figure 1: Overview of our pipeline. The pipeline starts with decomposing the user query. It then retrieves relevant visual assets, and rewrites the initial prompts for optimized image generation quality. The visual assets are scaled and positioned at various positions of the canvas to maximize generation diversity. After generation, candidate images are filtered and ranked based on rubric scores and aesthetic scores.
Figure 2: Baseline vs. pipeline outputs across three models for multiple product references. Prompts span the full width below each image row.
Figure 3: Preference rates for Baseline (light blue) vs. Pipeline (dark blue) across different models.

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models

TL;DR

Abstract

From Prompt to Production:Automating Brand-Safe Marketing Imagery with Text-to-Image Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)