Table of Contents
Fetching ...

Image Aesthetics Assessment via Learnable Queries

Zhiwei Xiong, Yunfan Zhang, Zhiqi Shen, Peiran Ren, Han Yu

TL;DR

The IAA-LQ approach adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.

Abstract

Image aesthetics assessment (IAA) aims to estimate the aesthetics of images. Depending on the content of an image, diverse criteria need to be selected to assess its aesthetics. Existing works utilize pre-trained vision backbones based on content knowledge to learn image aesthetics. However, training those backbones is time-consuming and suffers from attention dispersion. Inspired by learnable queries in vision-language alignment, we propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach. It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder. Extensive experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.

Image Aesthetics Assessment via Learnable Queries

TL;DR

The IAA-LQ approach adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.

Abstract

Image aesthetics assessment (IAA) aims to estimate the aesthetics of images. Depending on the content of an image, diverse criteria need to be selected to assess its aesthetics. Existing works utilize pre-trained vision backbones based on content knowledge to learn image aesthetics. However, training those backbones is time-consuming and suffers from attention dispersion. Inspired by learnable queries in vision-language alignment, we propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach. It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder. Extensive experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.
Paper Structure (11 sections, 7 equations, 2 figures, 5 tables)

This paper contains 11 sections, 7 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The design of IAA-LQ. It learns embeddings for learnable queries through a querying transformer, where pre-trained image features extracted with a frozen image encoder are inserted once in every two transformer blocks for cross-attention. The learned query embeddings are averaged and passed through a feed-forward layer and Softmax to output the predicted aesthetic DOS.
  • Figure 2: Examples of the IAA-LQ MOS prediction results. Images from the top row to the bottom row are example images with relatively high, moderate, and relatively low ground truth MOSs. The blue and (green) numbers beneath each image are its predicted and (ground truth) MOSs, respectively.