Complexity in Complexity: Understanding Visual Complexity Through Structure, Color, and Surprise
Karahan Sarıtaş, Peter Dayan, Tingke Shen, Surabhi S Nath
TL;DR
The paper addresses how humans perceive visual complexity and argues that interpretable, segmentation-based cues are insufficient on their own. It introduces three features—Multi-Scale Sobel Gradient ($\text{MSG}$) for structure, Multi-Scale Unique Color ($\text{MUC}$) for colorfulness, and surprise scores derived from a Large Language Model—to capture structural, chromatic, and holistic information, tested on datasets including a new Surprising Visual Genome (SVG). Using linear regression with cross-validated evaluation, the authors show that these features add explanatory power beyond segmentation counts, and that surprise provides a distinct, dataset-agnostic cue that improves predictions on SVG and complements MSG and MUC. The final model achieves state-of-the-art or near state-of-the-art performance across multiple datasets while preserving interpretability, and the SVG experiments demonstrate a meaningful link between surprise and perceived complexity. Overall, the work highlights the need for a multifaceted approach that combines low-level perceptual features with semantic and cognitive cues to robustly predict visual complexity across diverse imagery.
Abstract
Understanding how humans perceive visual complexity is a key area of study in visual cognition. Previous approaches to modeling visual complexity assessments have often resulted in intricate, difficult-to-interpret algorithms that employ numerous features or sophisticated deep learning architectures. While these complex models achieve high performance on specific datasets, they often sacrifice interpretability, making it challenging to understand the factors driving human perception of complexity. Recently (Shen, et al. 2024) proposed an interpretable segmentation-based model that accurately predicted complexity across various datasets, supporting the idea that complexity can be explained simply. In this work, we investigate the failure of their model to capture structural, color and surprisal contributions to complexity. To this end, we propose Multi-Scale Sobel Gradient (MSG) which measures spatial intensity variations, Multi-Scale Unique Color (MUC) which quantifies colorfulness across multiple scales, and surprise scores generated using a Large Language Model. We test our features on existing benchmarks and a novel dataset (Surprising Visual Genome) containing surprising images from Visual Genome. Our experiments demonstrate that modeling complexity accurately is not as simple as previously thought, requiring additional perceptual and semantic factors to address dataset biases. Our model improves predictive performance while maintaining interpretability, offering deeper insights into how visual complexity is perceived and assessed. Our code, analysis and data are available at https://github.com/Complexity-Project/Complexity-in-Complexity.
