Table of Contents
Fetching ...

AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling

Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Misha Sra, Pradeep Sen

TL;DR

The paper addresses the gap between image aesthetics and content-driven appeal by introducing image-content appeal assessment (ICAA) and the AID-AppEAL pipeline to automatically generate large ICAA datasets. It combines domain relevancy mapping, synthetic data generation with diffusion models and Textual Inversion, a Siamese CLIP-based relative comparator, and an absolute appeal estimator, followed by a heatmap-guided, depth-aware enhancement method. The two domain-specific datasets (food and room interiors) reveal little correlation between appeal and aesthetics, and user studies show strong reader preference for appeal-enhanced images, validating the approach. This work enables scalable ICAA data creation, robust appeal estimation, and localized content-appeal enhancements with practical implications for food, interior design, and related industries. $A(\,\cdot\,) $ and $M_D^H(I)$ formulations underpin the methodology, and the results demonstrate meaningful, domain-adaptive improvements in perceived content appeal.

Abstract

We propose Image Content Appeal Assessment (ICAA), a novel metric that quantifies the level of positive interest an image's content generates for viewers, such as the appeal of food in a photograph. This is fundamentally different from traditional Image-Aesthetics Assessment (IAA), which judges an image's artistic quality. While previous studies often confuse the concepts of ``aesthetics'' and ``appeal,'' our work addresses this by being the first to study ICAA explicitly. To do this, we propose a novel system that automates dataset creation and implements algorithms to estimate and boost content appeal. We use our pipeline to generate two large-scale datasets (70K+ images each) in diverse domains (food and room interior design) to train our models, which revealed little correlation between content appeal and aesthetics. Our user study, with more than 76% of participants preferring the appeal-enhanced images, confirms that our appeal ratings accurately reflect user preferences, establishing ICAA as a unique evaluative criterion. Our code and datasets are available at https://github.com/SherryXTChen/AID-Appeal.

AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling

TL;DR

The paper addresses the gap between image aesthetics and content-driven appeal by introducing image-content appeal assessment (ICAA) and the AID-AppEAL pipeline to automatically generate large ICAA datasets. It combines domain relevancy mapping, synthetic data generation with diffusion models and Textual Inversion, a Siamese CLIP-based relative comparator, and an absolute appeal estimator, followed by a heatmap-guided, depth-aware enhancement method. The two domain-specific datasets (food and room interiors) reveal little correlation between appeal and aesthetics, and user studies show strong reader preference for appeal-enhanced images, validating the approach. This work enables scalable ICAA data creation, robust appeal estimation, and localized content-appeal enhancements with practical implications for food, interior design, and related industries. and formulations underpin the methodology, and the results demonstrate meaningful, domain-adaptive improvements in perceived content appeal.

Abstract

We propose Image Content Appeal Assessment (ICAA), a novel metric that quantifies the level of positive interest an image's content generates for viewers, such as the appeal of food in a photograph. This is fundamentally different from traditional Image-Aesthetics Assessment (IAA), which judges an image's artistic quality. While previous studies often confuse the concepts of ``aesthetics'' and ``appeal,'' our work addresses this by being the first to study ICAA explicitly. To do this, we propose a novel system that automates dataset creation and implements algorithms to estimate and boost content appeal. We use our pipeline to generate two large-scale datasets (70K+ images each) in diverse domains (food and room interior design) to train our models, which revealed little correlation between content appeal and aesthetics. Our user study, with more than 76% of participants preferring the appeal-enhanced images, confirms that our appeal ratings accurately reflect user preferences, establishing ICAA as a unique evaluative criterion. Our code and datasets are available at https://github.com/SherryXTChen/AID-Appeal.
Paper Structure (30 sections, 5 equations, 19 figures, 1 table)

This paper contains 30 sections, 5 equations, 19 figures, 1 table.

Figures (19)

  • Figure 1: Image-content appeal assessment (ICAA) and enhancement. The $1^{st}$/$4^{th}$ columns show amateur photos lacking artistic appeal, while the $2^{nd}$/$5^{th}$ columns feature professionally taken images of less appealing content (a moldy burger and a dirty room). Because of their superior aesthetics, IAA baselines (DIAA kong2016photo, MPADA sheng2018attention, and NIMA talebi2018nima) rate them higher even though they have less appealing content (lowest scores underlined, highest in bold), while our ICAA estimator accurately assesses and enhances content appeal ($2^{nd}$/$5^{th}$ to $3^{rd}$/$6^{th}$ columns).
  • Figure 2: Domain-relevancy map generation. Given an image, we use BLIP li2022blip to estimate its description and extract all noun phrases $\mathbb{P}$ using NLTK bird2009natural. For every phrase, we look up each of its words in WordNet brown2005encyclopedia to get their lexnames and keep the phrase if any of them matches the domain $D$ (e.g., if $D$ is food, then the phrase is kept only if at least one word's lexname is $noun.food$). The resulting set of phrases is $\mathbb{P}_D$ and we use CLIPSeg lueddecke22_cvpr to create a segmentation map that locates objects described by each phrase in $\mathbb{P}_D$. These maps collectively define the image region that contains objects from $D$, and we call it the domain-relevancy map.
  • Figure 3: Synthetic dataset creation. Given an image $I$, its text description, and its domain-relevancy map $M_D(I)$, we first locate "background" regions $1-M_D(I)$ that should have minimal effect on content appeal. The image is first augmented using Stable Diffusion rombach2021highresolution(\ref{['eq:DiversifyingBackgroundSynthesis']}). We then use Textual Inversion gal2022textual to generate appealing/unappealing-content embeddings, which can change image content appeal with respect to $M_D(I)$ (\ref{['eq:VaryingAppealSynthesis']}).
  • Figure 4: Relative and absolute content appeal estimation. We use CLIP radford2021learning image encoder, followed by several fully connected (FC) layers to predict the image relative content appeal difference (\ref{['fig:comparator_pipeline']}) and absolute appeal (\ref{['fig:appeal_score_predictor']}).
  • Figure 5: Image content appeal heatmap generation. We define a sliding window to capture overlapping patches of an image, where we use the content appeal estimator to estimate the content appeal score of each patch. The value of the heatmap for each pixel is averaged over all patches that include the pixel; we normalize all values and take their inverse, so a lighter color means the content in that region is more unappealing.
  • ...and 14 more figures