Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks

Zhichao Yang; Jianjie Wang; Zhixianhe Zhang; Pangu Xie; Xiangfei Sheng; Pengfei Chen; Leida Li

Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks

Zhichao Yang, Jianjie Wang, Zhixianhe Zhang, Pangu Xie, Xiangfei Sheng, Pengfei Chen, Leida Li

TL;DR

FGAesQ is proposed, a novel IAA framework that learns discriminative aesthetic scores from relative ranks through Difference-preserved Tokenization (DiffToken), Comparative Text-assisted Alignment (CTAlign), and Rank-aware Regression (RankReg) that enables accurate aesthetic assessment in fine-grained scenarios while still maintains competitive performance in coarse-grained evaluation.

Abstract

Image aesthetic assessment (IAA) has extensive applications in content creation, album management, and recommendation systems, etc. In such applications, it is commonly needed to pick out the most aesthetically pleasing image from a series of images with subtle aesthetic variations, a topic we refer to as fine-grained IAA. Unfortunately, state-of-the-art IAA models are typically designed for coarse-grained evaluation, where images with notable aesthetic differences are evaluated independently on an absolute scale. These models are inherently limited in discriminating fine-grained aesthetic differences. To address the dilemma, we contribute FGAesthetics, a fine-grained IAA database with 32,217 images organized into 10,028 series, which are sourced from diverse categories including Natural, AIGC, and Cropping. Annotations are collected via pairwise comparisons within each series. We also devise Series Refinement and Rank Calibration to ensure the reliability of data and labels. Based on FGAesthetics, we further propose FGAesQ, a novel IAA framework that learns discriminative aesthetic scores from relative ranks through Difference-preserved Tokenization (DiffToken), Comparative Text-assisted Alignment (CTAlign), and Rank-aware Regression (RankReg). FGAesQ enables accurate aesthetic assessment in fine-grained scenarios while still maintains competitive performance in coarse-grained evaluation. Extensive experiments and comparisons demonstrate the superiority of the proposed method.

Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks

TL;DR

Abstract

Paper Structure (32 sections, 6 equations, 11 figures, 8 tables)

This paper contains 32 sections, 6 equations, 11 figures, 8 tables.

Introduction
Related Work
Image Aesthetic Assessment Datasets
Image Aesthetic Assessment Models
FGAesthetics
Data Collection
Series Refinement
Rank Calibration
Statistics of FGAesthetics
FGAesQ
Difference-preserved Tokenization
Comparative Text-assisted Alignment
Rank-aware Regression
Training of FGAesQ
Experiments
...and 17 more sections

Figures (11)

Figure 1: Which image is aesthetically more pleasing? We introduce FGAesthetics, a benchmark for Fine-grained Image Aesthetic Assessment that discriminates subtle aesthetic differences among similar images in photo series. It differs from previous coarse-grained datasets, e.g., AVA, where images with notable differences are evaluated independently on an absolute scale. This enables FGAesQ, a novel IAA model, to achieve accurate aesthetic assessment in fine-grained scenarios while maintaining robust coarse-grained evaluation.
Figure 2: Overview of the construction pipeline for FGAesthetics. The pipeline consists of three stages: (a) Data Collection. Visually similar photo series are collected from three distinct sources: Natural, AIGC, and Cropping. (b) Series Refinement. Noisy series data undergo rigorous filtering using a Metric-MLLMs-Human refinement protocol. (c) Rank Calibration. Pairwise comparisons are annotated within each series, excluding data that cannot be aesthetically distinguished to obtain calibrated aesthetic rankings.
Figure 3: Statistical analysis of FGAesthetics. (a) Image and series count distribution across Natural, AIGC, and Cropping. (b) Within-series similarity distributions measured by LPIPS (low-level) zhang2018unreasonable, DreamSim (mid-level) fu2023dreamsim, and CLIPScore (high-level) hessel2021clipscore across three sources. Note that LPIPS and DreamSim are subtracted from 1, ensuring consistent polarity with CLIPScore, where higher values indicate greater similarity.
Figure 4: Overall pipeline of the proposed FGAesQ. FGAesQ learns discriminative aesthetic scores from relative ranks through three modules: (a) Difference-preserved Tokenization (DiffToken) selectively maintains difference regions at their original resolution while downscaling others. (b) Comparative Text-assisted Alignment (CTAlign) achieves distinctive aesthetic visual representations. (c) Rank-aware Regression (RankReg) rectifies the coarse-grained score regression with fine-grained aesthetic rankings.
Figure 5: Visualization of evaluation results on three test series. Images arranged left-to-right in decreasing aesthetic quality. Red boxes and text indicate the best aesthetics.
...and 6 more figures

Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks

TL;DR

Abstract

Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks

Authors

TL;DR

Abstract

Table of Contents

Figures (11)