Explaining Automatic Image Assessment
Max Lisaius, Scott Wehrwein
TL;DR
This work tackles explainability in automatic image aesthetics by introducing perceptual modalities (depth, saliency, blur) and reimplementing NIMA with enhanced loss terms to produce distribution-aware predictions. By training modality-specific models and applying transfer learning, the authors quantify how different visual cues relate to aesthetic judgments and provide covariance analyses with established categories. Saliency emerges as the strongest modality across metrics, suggesting that attention-focused cues underpin many aesthetic judgments. The study demonstrates improved correlation-based metrics over NIMA and offers a framework for automatic, explainable aesthetic analysis applicable to multiple datasets and potential future modalities.
Abstract
Previous work in aesthetic categorization and explainability utilizes manual labeling and classification to explain aesthetic scores. These methods require a complex labeling process and are limited in size. Our proposed approach attempts to explain aesthetic assessment models through visualizing dataset trends and automatic categorization of visual aesthetic features through training neural networks on different versions of the same dataset. By evaluating the models adapted to each specific modality using existing and novel metrics, we can capture and visualize aesthetic features and trends.
