Evaluating Model Perception of Color Illusions in Photorealistic Scenes
Lingjun Mao, Zineng Tang, Alane Suhr
TL;DR
This work introduces RCID, a large-scale Realistic Color Illusion Dataset, to systematically evaluate color-illusion perception in vision-language systems. It presents an automated generation pipeline using ControlNet diffusion and procedural synthesis to create 19,000 photorealistic illusion images across contrast, stripe, and filter types, with human-validated labels and QA prompts. Through extensive experiments on open-source VLMs, the study demonstrates that models exhibit human-like perceptual biases on illusion content, influenced by prompting, fine-tuning, model size, and prior knowledge such as language and commonsense. The findings reveal the dual influence of the visual system and prior knowledge on VLMs and provide a practical baseline and guidelines for illusion-aware evaluation, with implications for safety and reliability in color-aware tasks.
Abstract
We study the perception of color illusions by vision-language models. Color illusion, where a person's visual system perceives color differently from actual color, is well-studied in human vision. However, it remains underexplored whether vision-language models (VLMs), trained on large-scale human data, exhibit similar perceptual biases when confronted with such color illusions. We propose an automated framework for generating color illusion images, resulting in RCID (Realistic Color Illusion Dataset), a dataset of 19,000 realistic illusion images. Our experiments show that all studied VLMs exhibit perceptual biases similar human vision. Finally, we train a model to distinguish both human perception and actual pixel differences.
