Table of Contents
Fetching ...

Complexity in Complexity: Understanding Visual Complexity Through Structure, Color, and Surprise

Karahan Sarıtaş, Peter Dayan, Tingke Shen, Surabhi S Nath

TL;DR

The paper addresses how humans perceive visual complexity and argues that interpretable, segmentation-based cues are insufficient on their own. It introduces three features—Multi-Scale Sobel Gradient ($\text{MSG}$) for structure, Multi-Scale Unique Color ($\text{MUC}$) for colorfulness, and surprise scores derived from a Large Language Model—to capture structural, chromatic, and holistic information, tested on datasets including a new Surprising Visual Genome (SVG). Using linear regression with cross-validated evaluation, the authors show that these features add explanatory power beyond segmentation counts, and that surprise provides a distinct, dataset-agnostic cue that improves predictions on SVG and complements MSG and MUC. The final model achieves state-of-the-art or near state-of-the-art performance across multiple datasets while preserving interpretability, and the SVG experiments demonstrate a meaningful link between surprise and perceived complexity. Overall, the work highlights the need for a multifaceted approach that combines low-level perceptual features with semantic and cognitive cues to robustly predict visual complexity across diverse imagery.

Abstract

Understanding how humans perceive visual complexity is a key area of study in visual cognition. Previous approaches to modeling visual complexity assessments have often resulted in intricate, difficult-to-interpret algorithms that employ numerous features or sophisticated deep learning architectures. While these complex models achieve high performance on specific datasets, they often sacrifice interpretability, making it challenging to understand the factors driving human perception of complexity. Recently (Shen, et al. 2024) proposed an interpretable segmentation-based model that accurately predicted complexity across various datasets, supporting the idea that complexity can be explained simply. In this work, we investigate the failure of their model to capture structural, color and surprisal contributions to complexity. To this end, we propose Multi-Scale Sobel Gradient (MSG) which measures spatial intensity variations, Multi-Scale Unique Color (MUC) which quantifies colorfulness across multiple scales, and surprise scores generated using a Large Language Model. We test our features on existing benchmarks and a novel dataset (Surprising Visual Genome) containing surprising images from Visual Genome. Our experiments demonstrate that modeling complexity accurately is not as simple as previously thought, requiring additional perceptual and semantic factors to address dataset biases. Our model improves predictive performance while maintaining interpretability, offering deeper insights into how visual complexity is perceived and assessed. Our code, analysis and data are available at https://github.com/Complexity-Project/Complexity-in-Complexity.

Complexity in Complexity: Understanding Visual Complexity Through Structure, Color, and Surprise

TL;DR

The paper addresses how humans perceive visual complexity and argues that interpretable, segmentation-based cues are insufficient on their own. It introduces three features—Multi-Scale Sobel Gradient () for structure, Multi-Scale Unique Color () for colorfulness, and surprise scores derived from a Large Language Model—to capture structural, chromatic, and holistic information, tested on datasets including a new Surprising Visual Genome (SVG). Using linear regression with cross-validated evaluation, the authors show that these features add explanatory power beyond segmentation counts, and that surprise provides a distinct, dataset-agnostic cue that improves predictions on SVG and complements MSG and MUC. The final model achieves state-of-the-art or near state-of-the-art performance across multiple datasets while preserving interpretability, and the SVG experiments demonstrate a meaningful link between surprise and perceived complexity. Overall, the work highlights the need for a multifaceted approach that combines low-level perceptual features with semantic and cognitive cues to robustly predict visual complexity across diverse imagery.

Abstract

Understanding how humans perceive visual complexity is a key area of study in visual cognition. Previous approaches to modeling visual complexity assessments have often resulted in intricate, difficult-to-interpret algorithms that employ numerous features or sophisticated deep learning architectures. While these complex models achieve high performance on specific datasets, they often sacrifice interpretability, making it challenging to understand the factors driving human perception of complexity. Recently (Shen, et al. 2024) proposed an interpretable segmentation-based model that accurately predicted complexity across various datasets, supporting the idea that complexity can be explained simply. In this work, we investigate the failure of their model to capture structural, color and surprisal contributions to complexity. To this end, we propose Multi-Scale Sobel Gradient (MSG) which measures spatial intensity variations, Multi-Scale Unique Color (MUC) which quantifies colorfulness across multiple scales, and surprise scores generated using a Large Language Model. We test our features on existing benchmarks and a novel dataset (Surprising Visual Genome) containing surprising images from Visual Genome. Our experiments demonstrate that modeling complexity accurately is not as simple as previously thought, requiring additional perceptual and semantic factors to address dataset biases. Our model improves predictive performance while maintaining interpretability, offering deeper insights into how visual complexity is perceived and assessed. Our code, analysis and data are available at https://github.com/Complexity-Project/Complexity-in-Complexity.

Paper Structure

This paper contains 24 sections, 1 equation, 10 figures, 5 tables, 3 algorithms.

Figures (10)

  • Figure 1: Left column: original images from Sav. Int.. Right column: gradient visualizations. B: baseline prediction using number of segmentations and classes. G = ground truth complexity. P = predicted complexity using baseline and MSG. All values are scaled between 0 and 100. The first image has 177 segmentations and 35 classes, while the second has 185 segmentations and 38 classes. Due to these similarities, the baseline model predicts nearly identical complexity scores for both. However, MSG acts as a latent dimension, refining predictions to better align with ground truth.
  • Figure 2: Racing sheep with toy riders or a flying skater from the SVG dataset illustrate the improvement in complexity predictions when incorporating surprise. The baseline model underestimates the visual complexity, with scores of B: 54 and B: 46, while ground truth values are G: 60 and G: 55. Incorporating surprise scores (85) reduces this gap, yielding adjusted predictions of P: 61 and P: 55, demonstrating the role of surprise in aligning predictions more closely with human perception. Explanations provided by gemini-1.5-flash enhance the interpretability of the assigned surprise scores.
  • Figure 3: Correlation between residuals (actual complexity - baseline predictions) and surprise scores (SVG).
  • Figure 4: A selection of responses from participants describing the strategies they employed to assess visual complexity. Their responses highlight various factors such as the number of elements, unusual or weird features, level of detail, and how confusing or unreal an image appeared. Words related to surprise, such as ‘unusual,’ ‘confusing,’ and ‘weird,’ have been highlighted in red for emphasis.
  • Figure 5: Comparison of two images having similar values of visual features but with differing complexity evaluations.
  • ...and 5 more figures