Self-Supervised Learning of Color Constancy
Markus R. Ernst, Francisco M. López, Arthur Aubret, Roland W. Fleming, Jochen Triesch
TL;DR
This work investigates how color constancy (CC) could develop via self-supervised learning that exploits temporal illumination changes. It introduces the Color Constancy Cubes (C3R) dataset and a time-contrastive learning framework (SimCLR-TT) to learn illumination-invariant representations, formalized by a contrastive loss $\mathcal{L}$ with $\tau=1$. After training, a linear probe on frozen features demonstrates CC by accurately predicting object color under varying lighting, outperforming a color-jitter baseline and revealing emergent color-based clustering in the learned latent space. The study suggests a plausible developmental mechanism for CC, highlights the role of temporal structure and context (e.g., ground plane), and discusses limitations and avenues for extending to more realistic scenes and joint encodings of color, shape, and viewpoint.
Abstract
Color constancy (CC) describes the ability of the visual system to perceive an object as having a relatively constant color despite changes in lighting conditions. While CC and its limitations have been carefully characterized in humans, it is still unclear how the visual system acquires this ability during development. Here, we present a first study showing that CC develops in a neural network trained in a self-supervised manner through an invariance learning objective. During learning, objects are presented under changing illuminations, while the network aims to map subsequent views of the same object onto close-by latent representations. This gives rise to representations that are largely invariant to the illumination conditions, offering a plausible example of how CC could emerge during human cognitive development via a form of self-supervised learning.
