Table of Contents
Fetching ...

Explaining Automatic Image Assessment

Max Lisaius, Scott Wehrwein

TL;DR

This work tackles explainability in automatic image aesthetics by introducing perceptual modalities (depth, saliency, blur) and reimplementing NIMA with enhanced loss terms to produce distribution-aware predictions. By training modality-specific models and applying transfer learning, the authors quantify how different visual cues relate to aesthetic judgments and provide covariance analyses with established categories. Saliency emerges as the strongest modality across metrics, suggesting that attention-focused cues underpin many aesthetic judgments. The study demonstrates improved correlation-based metrics over NIMA and offers a framework for automatic, explainable aesthetic analysis applicable to multiple datasets and potential future modalities.

Abstract

Previous work in aesthetic categorization and explainability utilizes manual labeling and classification to explain aesthetic scores. These methods require a complex labeling process and are limited in size. Our proposed approach attempts to explain aesthetic assessment models through visualizing dataset trends and automatic categorization of visual aesthetic features through training neural networks on different versions of the same dataset. By evaluating the models adapted to each specific modality using existing and novel metrics, we can capture and visualize aesthetic features and trends.

Explaining Automatic Image Assessment

TL;DR

This work tackles explainability in automatic image aesthetics by introducing perceptual modalities (depth, saliency, blur) and reimplementing NIMA with enhanced loss terms to produce distribution-aware predictions. By training modality-specific models and applying transfer learning, the authors quantify how different visual cues relate to aesthetic judgments and provide covariance analyses with established categories. Saliency emerges as the strongest modality across metrics, suggesting that attention-focused cues underpin many aesthetic judgments. The study demonstrates improved correlation-based metrics over NIMA and offers a framework for automatic, explainable aesthetic analysis applicable to multiple datasets and potential future modalities.

Abstract

Previous work in aesthetic categorization and explainability utilizes manual labeling and classification to explain aesthetic scores. These methods require a complex labeling process and are limited in size. Our proposed approach attempts to explain aesthetic assessment models through visualizing dataset trends and automatic categorization of visual aesthetic features through training neural networks on different versions of the same dataset. By evaluating the models adapted to each specific modality using existing and novel metrics, we can capture and visualize aesthetic features and trends.

Paper Structure

This paper contains 16 sections, 4 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: A breakdown of a source image into different modalities
  • Figure 2: An illustration of the variable layer freezing methods used for transfer learning for early modality training.
  • Figure 3: Blur Examples
  • Figure 4: Blur Details
  • Figure 5: Depth Examples
  • ...and 10 more figures