Table of Contents
Fetching ...

From Concepts to Judgments: Interpretable Image Aesthetic Assessment

Xiao-Chang Liu, Johan Wagemans

Abstract

Image aesthetic assessment (IAA) aims to predict the aesthetic quality of images as perceived by humans. While recent IAA models achieve strong predictive performance, they offer little insight into the factors driving their predictions. Yet for users, understanding why an image is considered pleasing or not is as valuable as the score itself, motivating growing interest in interpretability within IAA. When humans evaluate aesthetics, they naturally rely on high-level cues to justify their judgments. Motivated by this observation, we propose an interpretable IAA framework grounded in human-understandable aesthetic concepts. We learn these concepts in an accessible manner, constructing a subspace that forms the foundation of an inherently interpretable model. To capture nuanced influences on aesthetic perception beyond explicit concepts, we introduce a simple yet effective residual predictor. Experiments on photographic and artistic datasets demonstrate that our method achieves competitive predictive performance while offering transparent, human-understandable aesthetic judgments.

From Concepts to Judgments: Interpretable Image Aesthetic Assessment

Abstract

Image aesthetic assessment (IAA) aims to predict the aesthetic quality of images as perceived by humans. While recent IAA models achieve strong predictive performance, they offer little insight into the factors driving their predictions. Yet for users, understanding why an image is considered pleasing or not is as valuable as the score itself, motivating growing interest in interpretability within IAA. When humans evaluate aesthetics, they naturally rely on high-level cues to justify their judgments. Motivated by this observation, we propose an interpretable IAA framework grounded in human-understandable aesthetic concepts. We learn these concepts in an accessible manner, constructing a subspace that forms the foundation of an inherently interpretable model. To capture nuanced influences on aesthetic perception beyond explicit concepts, we introduce a simple yet effective residual predictor. Experiments on photographic and artistic datasets demonstrate that our method achieves competitive predictive performance while offering transparent, human-understandable aesthetic judgments.
Paper Structure (12 sections, 7 equations, 8 figures, 4 tables)

This paper contains 12 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Methods overview. We predict the aesthetic score of an image using human-understandable concepts and an inherently interpretable sparse linear model. (a) Aesthetic Concept Learning: Given an aesthetic concept $s$ (e.g., rule of thirds) and user-defined positive/negative image sets, we learn its Concept Activation Vector (CAV) $\mathbf{c}_s\in\mathbb{R}^d$ by training a linear Support Vector Machine (SVM) to distinguish between image embeddings from the two sets. The learned CAV is orthogonal to the decision boundary. (b) Concept Subspace Construction: We aggregate the CAVs of $N_c$ aesthetic concepts to form a concept subspace $\mathbf{C}\in\mathbb{R}^{d \times N_c}$. (c) Interpretable Aesthetic Assessment: For a given image, we extract its embedding using a pre-trained image encoder and project it onto the concept subspace. The resulting concept projection is used by an inherently interpretable sparse linear model to predict the aesthetic score. To account for nuanced influences on aesthetic judgment beyond explicit concepts, we add a residual predictor that complements the interpretable core.
  • Figure 2: Learned aesthetic concept weights on the AADB dataset, ordered by importance (corresponding to $\mathbf{w}$ in \ref{['eq:sparse_linear_out']}). The bias term is 0.538 (corresponding to $b$ in \ref{['eq:sparse_linear_out']}).
  • Figure 3: Aesthetic score prediction on the AADB test image. Bottom left shows the ground truth (GT), our interpretable prediction (Interp. pred.), and hybrid prediction (Hybrid pred.). The right side shows the image's projection on the learned concept subspace.
  • Figure 4: Learned weights of aesthetic concepts on the PARA dataset. The learned bias term is $3.017$.
  • Figure 5: Aesthetic score prediction on the PARA test image. Bottom left shows the ground truth and our predictions. The right side shows the image's projection on the learned concept subspace.
  • ...and 3 more figures