Table of Contents
Fetching ...

Multi-task convolutional neural network for image aesthetic assessment

Derya Soydaner, Johan Wagemans

TL;DR

This work tackles image aesthetic assessment by treating it as a regression problem and introducing a simple, end-to-end multi-task CNN that jointly predicts an image's overall aesthetic score and multiple attribute scores. Built on a VGG16 backbone with a shared head, the model achieves state-of-the-art performance on the AADB dataset and establishes a new baseline on EVA while using far fewer parameters than prior approaches. Across two benchmarks, the multi-task setup consistently outperforms single-task variants, and fine-tuning strengthens attribute predictions and overall scores, with Grad-CAM visualizations offering interpretability. The results demonstrate that exploiting related aesthetic attributes in a joint framework yields both improved predictive accuracy and efficiency, and cross-dataset tests indicate meaningful generalization capabilities for future research in computational aesthetics.

Abstract

As people's aesthetic preferences for images are far from understood, image aesthetic assessment is a challenging artificial intelligence task. The range of factors underlying this task is almost unlimited, but we know that some aesthetic attributes affect those preferences. In this study, we present a multi-task convolutional neural network that takes into account these attributes. The proposed neural network jointly learns the attributes along with the overall aesthetic scores of images. This multi-task learning framework allows for effective generalization through the utilization of shared representations. Our experiments demonstrate that the proposed method outperforms the state-of-the-art approaches in predicting overall aesthetic scores for images in one benchmark of image aesthetics. We achieve near-human performance in terms of overall aesthetic scores when considering the Spearman's rank correlations. Moreover, our model pioneers the application of multi-tasking in another benchmark, serving as a new baseline for future research. Notably, our approach achieves this performance while using fewer parameters compared to existing multi-task neural networks in the literature, and consequently makes our method more efficient in terms of computational complexity.

Multi-task convolutional neural network for image aesthetic assessment

TL;DR

This work tackles image aesthetic assessment by treating it as a regression problem and introducing a simple, end-to-end multi-task CNN that jointly predicts an image's overall aesthetic score and multiple attribute scores. Built on a VGG16 backbone with a shared head, the model achieves state-of-the-art performance on the AADB dataset and establishes a new baseline on EVA while using far fewer parameters than prior approaches. Across two benchmarks, the multi-task setup consistently outperforms single-task variants, and fine-tuning strengthens attribute predictions and overall scores, with Grad-CAM visualizations offering interpretability. The results demonstrate that exploiting related aesthetic attributes in a joint framework yields both improved predictive accuracy and efficiency, and cross-dataset tests indicate meaningful generalization capabilities for future research in computational aesthetics.

Abstract

As people's aesthetic preferences for images are far from understood, image aesthetic assessment is a challenging artificial intelligence task. The range of factors underlying this task is almost unlimited, but we know that some aesthetic attributes affect those preferences. In this study, we present a multi-task convolutional neural network that takes into account these attributes. The proposed neural network jointly learns the attributes along with the overall aesthetic scores of images. This multi-task learning framework allows for effective generalization through the utilization of shared representations. Our experiments demonstrate that the proposed method outperforms the state-of-the-art approaches in predicting overall aesthetic scores for images in one benchmark of image aesthetics. We achieve near-human performance in terms of overall aesthetic scores when considering the Spearman's rank correlations. Moreover, our model pioneers the application of multi-tasking in another benchmark, serving as a new baseline for future research. Notably, our approach achieves this performance while using fewer parameters compared to existing multi-task neural networks in the literature, and consequently makes our method more efficient in terms of computational complexity.
Paper Structure (19 sections, 2 equations, 14 figures, 11 tables)

This paper contains 19 sections, 2 equations, 14 figures, 11 tables.

Figures (14)

  • Figure 1: The general architecture of our multi-task convolutional neural network.
  • Figure 2: Example images from the training set of the AADB dataset. Each image has overall aesthetic score and scores for 11 attributes. (Left) High aesthetic: An image rated high on overall aesthetic score. (Right) Low aesthetic: An image rated low on overall aesthetic score.
  • Figure 3: Visualization of image attribute data in the training set of AADB dataset illustrating the distribution of negative, null, and positive levels for each attribute kong2016.
  • Figure 4: Example images from the training set of the EVA dataset. Each image has overall aesthetic score and scores for 4 attributes. (Left) High aesthetic: An image rated high on overall aesthetic score. (Right) Low aesthetic: An image rated low on overall aesthetic score.
  • Figure 5: Visualization of model predictions: A scatter plot comparing the actual overall aesthetic scores of test images in the AADB dataset to the predicted scores generated by our multi-task CNN.
  • ...and 9 more figures