The Art of Deception: Color Visual Illusions and Diffusion Models
Alex Gomez-Villa, Kai Wang, Alejandro C. Parraga, Bartlomiej Twardowski, Jesus Malo, Javier Vazquez-Corral, Joost van de Weijer
TL;DR
This work investigates why visual illusions arise in both humans and diffusion models by studying DDIM inversion trajectories, showing that intermediate latents undergo human-like brightness and color shifts. It develops a region-targeted VI generation pipeline for text-to-image diffusion models, guided by a perceptual loss and a region-compatibility term, and validates the approach with extensive datasets and psychophysical experiments. Key contributions include (i) empirical replication of brightness/color illusions in diffusion models across VI datasets, (ii) a method to generate novel VIs within natural images with region-specific control, and (iii) psychophysical confirmation that model-generated illusions can fool human observers, outperforming classical baselines. The results suggest diffusion processes encode perceptual statistics akin to human vision, with practical implications for perceptually informed image editing and more robust, human-aligned vision-language systems.
Abstract
Visual illusions in humans arise when interpreting out-of-distribution stimuli: if the observer is adapted to certain statistics, perception of outliers deviates from reality. Recent studies have shown that artificial neural networks (ANNs) can also be deceived by visual illusions. This revelation raises profound questions about the nature of visual information. Why are two independent systems, both human brains and ANNs, susceptible to the same illusions? Should any ANN be capable of perceiving visual illusions? Are these perceptions a feature or a flaw? In this work, we study how visual illusions are encoded in diffusion models. Remarkably, we show that they present human-like brightness/color shifts in their latent space. We use this fact to demonstrate that diffusion models can predict visual illusions. Furthermore, we also show how to generate new unseen visual illusions in realistic images using text-to-image diffusion models. We validate this ability through psychophysical experiments that show how our model-generated illusions also fool humans.
