Table of Contents
Fetching ...

Learning Color Equivariant Representations

Yulong Yang, Felix O'Mahony, Christine Allen-Blanchette

TL;DR

This work addresses the sensitivity of conventional CNNs to color perturbations by introducing color-equivariant GCNNs built on hue, saturation, and luminance groups. A lifting-based framework lifts inputs into the color group space, enabling genuine equivariance to hue, saturation, and luminance shifts, and avoiding artifacts that plagued prior CEConv approaches. The approach yields dramatically reduced equivariance error, improved generalization under out-of-distribution color variations, and enhanced sample efficiency across diverse synthetic and real datasets, including Hue-shift MNIST, Hue-shift 3D Shapes, Camelyon17, and several large-scale benchmarks. These color-aware representations enable new tasks such as color-based sorting and offer practical impact for robust perception under perceptual variations, with future work aimed at continuous group extensions and computational optimization.

Abstract

In this paper, we introduce group convolutional neural networks (GCNNs) equivariant to color variation. GCNNs have been designed for a variety of geometric transformations from 2D and 3D rotation groups, to semi-groups such as scale. Despite the improved interpretability, accuracy and generalizability of these architectures, GCNNs have seen limited application in the context of perceptual quantities. Notably, the recent CEConv network uses a GCNN to achieve equivariance to hue transformations by convolving input images with a hue rotated RGB filter. However, this approach leads to invalid RGB values which break equivariance and degrade performance. We resolve these issues with a lifting layer that transforms the input image directly, thereby circumventing the issue of invalid RGB values and improving equivariance error by over three orders of magnitude. Moreover, we extend the notion of color equivariance to include equivariance to saturation and luminance shift. Our hue-, saturation-, luminance- and color-equivariant networks achieve strong generalization to out-of-distribution perceptual variations and improved sample efficiency over conventional architectures. We demonstrate the utility of our approach on synthetic and real world datasets where we consistently outperform competitive baselines.

Learning Color Equivariant Representations

TL;DR

This work addresses the sensitivity of conventional CNNs to color perturbations by introducing color-equivariant GCNNs built on hue, saturation, and luminance groups. A lifting-based framework lifts inputs into the color group space, enabling genuine equivariance to hue, saturation, and luminance shifts, and avoiding artifacts that plagued prior CEConv approaches. The approach yields dramatically reduced equivariance error, improved generalization under out-of-distribution color variations, and enhanced sample efficiency across diverse synthetic and real datasets, including Hue-shift MNIST, Hue-shift 3D Shapes, Camelyon17, and several large-scale benchmarks. These color-aware representations enable new tasks such as color-based sorting and offer practical impact for robust perception under perceptual variations, with future work aimed at continuous group extensions and computational optimization.

Abstract

In this paper, we introduce group convolutional neural networks (GCNNs) equivariant to color variation. GCNNs have been designed for a variety of geometric transformations from 2D and 3D rotation groups, to semi-groups such as scale. Despite the improved interpretability, accuracy and generalizability of these architectures, GCNNs have seen limited application in the context of perceptual quantities. Notably, the recent CEConv network uses a GCNN to achieve equivariance to hue transformations by convolving input images with a hue rotated RGB filter. However, this approach leads to invalid RGB values which break equivariance and degrade performance. We resolve these issues with a lifting layer that transforms the input image directly, thereby circumventing the issue of invalid RGB values and improving equivariance error by over three orders of magnitude. Moreover, we extend the notion of color equivariance to include equivariance to saturation and luminance shift. Our hue-, saturation-, luminance- and color-equivariant networks achieve strong generalization to out-of-distribution perceptual variations and improved sample efficiency over conventional architectures. We demonstrate the utility of our approach on synthetic and real world datasets where we consistently outperform competitive baselines.
Paper Structure (43 sections, 41 equations, 20 figures, 8 tables)

This paper contains 43 sections, 41 equations, 20 figures, 8 tables.

Figures (20)

  • Figure 1: Color-equivariant network.(a) The equivariance of our hue-equivariant model is illustrated by the commutativity of the (hue) rotation and neural network mapping. A hue rotation of 90$^\circ$ in the input image space (top-left to bottom-left), results in a feature map rotation at each layer of the network (top-right to bottom-right). Corresponding feature maps are highlighted with a blue border. (b) An input image (left) is lifted to the hue-saturation group (right) by shifting its hue and saturation values. For comparison, we illustrate the CEConv lifting layer in Appendix \ref{['Apx:CEConv_Rotation']}.
  • Figure 2: Saturation-equivariant feature maps. We illustrate the equivariance of our saturation-equivariant model. A saturation shift in the input image space (top-left to bottom-left), results in a feature map translation at each layer of the network (top-right to bottom-right). Corresponding feature maps are highlighted with a blue border.
  • Figure 3: Impact of order on hue rotation invertibility. Our lifting layer (blue) operates on HSL input images where each hue rotation is invertible. The lifting layer proposed in CEConv (orange) operates on RGB filters and suffers from invalid hue rotations for all discretizations of the hue group except for $N=1$ and $N=3$ (i.e., symmetries of the axis-aligned RGB cube). (Left) We show the impact of invalid hue rotations on a four pixel image. We rotate the image $60^\circ$ using our proposed lifting layer (top-left) and the CEConv lifting layer (bottom-left). Subsequently applying a $-60^\circ$ rotation yields an image that is indistinguishable from the original using our approach (top-left), and one with visible artifacts using the CEConv approach (bottom-left). (Right) We show the average restored image error for both approaches. Our approach results in a consistently negligible restored image error, however, the CEConv approach results in a restored image error exceeding $7\%$ for all discretizations of the hue group except $N=1$ and $N=3$.
  • Figure 4: Model sample efficiency. We show the error improvement (higher is better) over the Z2CNN baseline as a function of the percentage of training examples used. The advantage of our Hue-$N$ models increases as the percentage of training examples used decreases.
  • Figure 5: Hue-shift MNIST feature map visualization. We compare the feature map trajectories of MNIST digits as their hue is varied from 1$^\circ$ to 360$^\circ$. The color of the trajectory corresponds to the class label. (a) tSNE projection of hue shifted feature map trajectories in the Z2CNN baseline model. As the hue of the input changes, the location of the digit in the feature space changes significantly. (b) tSNE projection of hue shifted feature map trajectories in our hue-equivariant CNN. In contrast to the Z2CNN baseline, the location of the digit in the feature space changes minimally.
  • ...and 15 more figures