Table of Contents
Fetching ...

CAP: Evaluation of Persuasive and Creative Image Generation

Aysan Aghazadeh, Adriana Kovashka

TL;DR

CAP introduces three metrics—Creativity, Alignment, and Persuasiveness (CAP)—to evaluate advertisement image generation from visually implicit messages. The authors develop AIM to measure image–message alignment, a creativity score C_obj, and a persuasiveness score P_comp, integrating Multimodal LLMs and LLM-based message expansion to prompt T2I models. They show state-of-the-art T2I models struggle with creativity, persuasiveness, and alignment on implicit ads, and demonstrate that expanding messages with LLMs significantly improves CAP across commercial and PSA ads on PittAd. The work provides a benchmark and a practical, simple augmentation strategy to produce more persuasive and creative ad images.

Abstract

We address the task of advertisement image generation and introduce three evaluation metrics to assess Creativity, prompt Alignment, and Persuasiveness (CAP) in generated advertisement images. Despite recent advancements in Text-to-Image (T2I) generation and their performance in generating high-quality images for explicit descriptions, evaluating these models remains challenging. Existing evaluation methods focus largely on assessing alignment with explicit, detailed descriptions, but evaluating alignment with visually implicit prompts remains an open problem. Additionally, creativity and persuasiveness are essential qualities that enhance the effectiveness of advertisement images, yet are seldom measured. To address this, we propose three novel metrics for evaluating the creativity, alignment, and persuasiveness of generated images. Our findings reveal that current T2I models struggle with creativity, persuasiveness, and alignment when the input text is implicit messages. We further introduce a simple yet effective approach to enhance T2I models' capabilities in producing images that are better aligned, more creative, and more persuasive.

CAP: Evaluation of Persuasive and Creative Image Generation

TL;DR

CAP introduces three metrics—Creativity, Alignment, and Persuasiveness (CAP)—to evaluate advertisement image generation from visually implicit messages. The authors develop AIM to measure image–message alignment, a creativity score C_obj, and a persuasiveness score P_comp, integrating Multimodal LLMs and LLM-based message expansion to prompt T2I models. They show state-of-the-art T2I models struggle with creativity, persuasiveness, and alignment on implicit ads, and demonstrate that expanding messages with LLMs significantly improves CAP across commercial and PSA ads on PittAd. The work provides a benchmark and a practical, simple augmentation strategy to produce more persuasive and creative ad images.

Abstract

We address the task of advertisement image generation and introduce three evaluation metrics to assess Creativity, prompt Alignment, and Persuasiveness (CAP) in generated advertisement images. Despite recent advancements in Text-to-Image (T2I) generation and their performance in generating high-quality images for explicit descriptions, evaluating these models remains challenging. Existing evaluation methods focus largely on assessing alignment with explicit, detailed descriptions, but evaluating alignment with visually implicit prompts remains an open problem. Additionally, creativity and persuasiveness are essential qualities that enhance the effectiveness of advertisement images, yet are seldom measured. To address this, we propose three novel metrics for evaluating the creativity, alignment, and persuasiveness of generated images. Our findings reveal that current T2I models struggle with creativity, persuasiveness, and alignment when the input text is implicit messages. We further introduce a simple yet effective approach to enhance T2I models' capabilities in producing images that are better aligned, more creative, and more persuasive.

Paper Structure

This paper contains 24 sections, 2 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Human-created ads (a, d) convey their message through creative and persuasive visual storytelling, blending the implicit message seamlessly into the visuals. The T2I baseline (b, e), on the other hand, depicts relevant entities but without the underlying intent. This highlights the need for better metrics for persuasiveness, creativity, and abstract text-image alignment. Ours (c, f) demonstrates how improving visual storytelling can enhance ad generation by aligning visuals with the intended message effectively. Texts on the right form the prompt to T2I models.
  • Figure 2: Overview of AIM. Orange denotes training, while blue is inference. $AR_w$ and $AR_l$ are used in training as the preferred and dis-preferred statements. $AR_m$ is the prompt for the T2I model.
  • Figure 3: Process of computing $P_{comp+AIM}$ persuasiveness score
  • Figure 4: Example of images chosen by each annotator between $I_{AR}$ (left) and $I_{LLAMA3}$ (right). For each pair of images, annotators select the image that better aligns with each $AR_m$. In each row, the value under each image indicates the score generated by the metric listed. A $\checkmark$ represents the chosen (ranked better) image, while a $\times$ indicates the rejected image. The green circle highlights agreement with human annotations in choosing the better-aligned image, and the red circle indicates disagreement.
  • Figure 5: Creativity: (d) which shows unique but relevant object portrayal scores best, while random (a) and generic objects (b) score worst. Red/Green shows low/high values for $C_{obj}$, low/high for $AIM$, & high/low for $Sim$. Blue denotes moderate values.
  • ...and 3 more figures