Multi-task convolutional neural network for image aesthetic assessment
Derya Soydaner, Johan Wagemans
TL;DR
This work tackles image aesthetic assessment by treating it as a regression problem and introducing a simple, end-to-end multi-task CNN that jointly predicts an image's overall aesthetic score and multiple attribute scores. Built on a VGG16 backbone with a shared head, the model achieves state-of-the-art performance on the AADB dataset and establishes a new baseline on EVA while using far fewer parameters than prior approaches. Across two benchmarks, the multi-task setup consistently outperforms single-task variants, and fine-tuning strengthens attribute predictions and overall scores, with Grad-CAM visualizations offering interpretability. The results demonstrate that exploiting related aesthetic attributes in a joint framework yields both improved predictive accuracy and efficiency, and cross-dataset tests indicate meaningful generalization capabilities for future research in computational aesthetics.
Abstract
As people's aesthetic preferences for images are far from understood, image aesthetic assessment is a challenging artificial intelligence task. The range of factors underlying this task is almost unlimited, but we know that some aesthetic attributes affect those preferences. In this study, we present a multi-task convolutional neural network that takes into account these attributes. The proposed neural network jointly learns the attributes along with the overall aesthetic scores of images. This multi-task learning framework allows for effective generalization through the utilization of shared representations. Our experiments demonstrate that the proposed method outperforms the state-of-the-art approaches in predicting overall aesthetic scores for images in one benchmark of image aesthetics. We achieve near-human performance in terms of overall aesthetic scores when considering the Spearman's rank correlations. Moreover, our model pioneers the application of multi-tasking in another benchmark, serving as a new baseline for future research. Notably, our approach achieves this performance while using fewer parameters compared to existing multi-task neural networks in the literature, and consequently makes our method more efficient in terms of computational complexity.
