Table of Contents
Fetching ...

Emergence of unique hues from sparse coding of color in natural scenes

Alexander Belsten, E. Paxon Frady, Bruno A. Olshausen

Abstract

Our subjective experience of color is typically described by abstract properties such as hue, saturation, and brightness that do not directly correspond to sensory signals arising from cones in the retina. Along the hue dimension, certain colors -- red, green, blue, and yellow -- appear unique in that they are not perceived as a combination of other colors, and the pairs red-green and blue-yellow appear opposites. However, the anatomical and physiological correlates of these 'unique hues' within the brain and the reason for their existence remain a mystery. Here, we demonstrate a direct connection between these hues and the statistics of the natural visual environment. Analysis of simulated cone responses on a dataset of 503 calibrated natural images reveals a strongly non-Gaussian distribution in 3D color space, with heavy tails in distinct, asymmetrically arranged directions. A sparse coding model is then adapted to this data so as to minimize the total sum of coefficients on the basis vectors for representing the data. A six basis-vector model converges to the four unique hues in addition to black and white. Moreover, we find that the nonlinear nature of inference in the sparse coding model yields both excitatory and inhibitory interactions among latent variables; the former facilitates combining adjacent pairs of unique hues to encode intermediate hues situated between them, while the latter enforces mutual exclusivity between opposite unique hues. Together, these findings shed new light on the distribution of color in the natural environment and provide a linking principle between this structure and the phenomenology of color appearance.

Emergence of unique hues from sparse coding of color in natural scenes

Abstract

Our subjective experience of color is typically described by abstract properties such as hue, saturation, and brightness that do not directly correspond to sensory signals arising from cones in the retina. Along the hue dimension, certain colors -- red, green, blue, and yellow -- appear unique in that they are not perceived as a combination of other colors, and the pairs red-green and blue-yellow appear opposites. However, the anatomical and physiological correlates of these 'unique hues' within the brain and the reason for their existence remain a mystery. Here, we demonstrate a direct connection between these hues and the statistics of the natural visual environment. Analysis of simulated cone responses on a dataset of 503 calibrated natural images reveals a strongly non-Gaussian distribution in 3D color space, with heavy tails in distinct, asymmetrically arranged directions. A sparse coding model is then adapted to this data so as to minimize the total sum of coefficients on the basis vectors for representing the data. A six basis-vector model converges to the four unique hues in addition to black and white. Moreover, we find that the nonlinear nature of inference in the sparse coding model yields both excitatory and inhibitory interactions among latent variables; the former facilitates combining adjacent pairs of unique hues to encode intermediate hues situated between them, while the latter enforces mutual exclusivity between opposite unique hues. Together, these findings shed new light on the distribution of color in the natural environment and provide a linking principle between this structure and the phenomenology of color appearance.

Paper Structure

This paper contains 18 sections, 9 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Three-stage model of chromatic processing in the visual system. a) The pixels of a large dataset of natural images are represented in LMS cone activation space (stage 1). b) A linear sphering transform decorrelates the activations by first rotating to the principal components, yielding a set of cone-opponent bases, and then rescaling each axis to achieve unit variance in all directions (stage 2). c) A set of bases that efficiently represent higher-order structure in the cone-opponent space is derived using nonnegative sparse coding models (stage 3). A six basis-vector model converges to directions that align with the unique hues, forming a new 6D, nonnegative space whose axes correspond to perceptual color categories. The inference process of sparse coding leads to mutual exclusivity between opponent colors (i.e., red-green, blue-yellow, black-white) depicted here in a 3D space with opposing colors at opposite ends of each axis. Notably, these color-opponent axes, when projected back into the cone-opponent space, are non-orthogonal and not aligned with the cone-opponent axes. Circles in b and c indicate chromatic planes.
  • Figure 2: a) Principal components of the distribution of LMS activations. The weights' alignment with the horizontal dashed lines at $\pm\text{sqrt}(1/6)$, $\pm\text{sqrt}(1/3)$, $\pm\text{sqrt}(1/2)$, and $\pm\text{sqrt}(2/3)$ reveal the near-integer-multiple relationship in how they mix L, M, and S signals (as pointed out by ruderman1998statistics). b) Variance of each principal component. c-e) LMS activation distributions in the sphered color space exhibit anisotropic, non-Gaussian structure. Solid lines show isoprobability contours after projection onto three different 2D planes defined by each possible pairing of axes in the sphered color space, as indicated by the axis labels. Dashed contours indicate isoprobability contours for a multivariate isotropic Gaussian distribution with the same variance as the data distribution. The contours are equally spaced in log-probability, using the same color scale for both solid and dashed contours. The color circle inset in e indicates the hue of each angle within the cone-opponent chromatic plane.
  • Figure 3: a) Adapted nonnegative sparse coding basis vectors are shown in the sphered color space. The number of basis vectors is varied from left to right from four to eight. Each basis vector has unit length. The black ellipse depicts a unit-radius circle within the chromatic plane. b) Basis vectors from a are shown projected into the chromatic plane. The black circle depicts unit radius, and the outer ring shows the corresponding color for each hue angle. The gray curve reproduces the $\log_{10}\text{probability}=-4$ contour from Figure \ref{['fig:stats']}e; it is uniformly rescaled for visual comparison, so shape is preserved but absolute scale is not. Vectors closely aligned with the achromatic axis project to (near) zero and may not be visible.
  • Figure 4: Response maps of latent variable MAP activations ${s}_i$ as a function of 2D location within the Munsell color chart (top). Each panel shows the iso-response contours for each ${s}_i$ within a particular sparse coding model, with $m$ increasing top to bottom. Solid contour lines indicate response levels from 55% to 95% of each latent variable’s maximum, in 10% increments. Contour color indicates the direction of the corresponding basis vector $\mathbf{a}_i$ in sphered color space. The four-basis-vector model places strong weight on its two largely achromatic basis vectors to describe the blue/green directions. This is due to their opposing nature and the lack of other basis vectors that could be used to describe this region of color space. To show that these vectors are indeed sensitive to luminance variation, their iso-responses contours are shown at a 20% and 30% of each coefficient's maximum in dashed lines.
  • Figure 5: Comparison of the adapted 6-basis-vector model with alternative models containing six basis-vectors. a) The three bases analyzed, projected onto the chromatic plane. All bases span the full 3D space and include two achromatic vectors (white and black). Left to right: the adapted basis, a cardinal cone-opponent basis, and a teal-lime-orange-purple basis. b) Reconstruction quality (mean squared error) versus sparsity (average $\ell_1$ norm) curves for each basis, sweeping across sparsity thresholds $\lambda$. Across thresholds, the adapted basis consistently achieves lower MSE and greater sparsity. c) Each basis is rotated in azimuth (about the vertical axis), altering the hue tuning of the chromatic vectors. d) Energy (the sum of MSE and $\ell_1$ sparsity weighted by $\lambda$) as a function of azimuth rotation (see c). The minimal-energy configuration occurs for the adapted basis at zero rotation.
  • ...and 5 more figures