CAP: Evaluation of Persuasive and Creative Image Generation
Aysan Aghazadeh, Adriana Kovashka
TL;DR
CAP introduces three metrics—Creativity, Alignment, and Persuasiveness (CAP)—to evaluate advertisement image generation from visually implicit messages. The authors develop AIM to measure image–message alignment, a creativity score C_obj, and a persuasiveness score P_comp, integrating Multimodal LLMs and LLM-based message expansion to prompt T2I models. They show state-of-the-art T2I models struggle with creativity, persuasiveness, and alignment on implicit ads, and demonstrate that expanding messages with LLMs significantly improves CAP across commercial and PSA ads on PittAd. The work provides a benchmark and a practical, simple augmentation strategy to produce more persuasive and creative ad images.
Abstract
We address the task of advertisement image generation and introduce three evaluation metrics to assess Creativity, prompt Alignment, and Persuasiveness (CAP) in generated advertisement images. Despite recent advancements in Text-to-Image (T2I) generation and their performance in generating high-quality images for explicit descriptions, evaluating these models remains challenging. Existing evaluation methods focus largely on assessing alignment with explicit, detailed descriptions, but evaluating alignment with visually implicit prompts remains an open problem. Additionally, creativity and persuasiveness are essential qualities that enhance the effectiveness of advertisement images, yet are seldom measured. To address this, we propose three novel metrics for evaluating the creativity, alignment, and persuasiveness of generated images. Our findings reveal that current T2I models struggle with creativity, persuasiveness, and alignment when the input text is implicit messages. We further introduce a simple yet effective approach to enhance T2I models' capabilities in producing images that are better aligned, more creative, and more persuasive.
