Table of Contents
Fetching ...

A Hypertoroidal Covering for Perfect Color Equivariance

Yulong Yang, Zhikun Xu, Yaojun Li, Christine Allen-Blanchette

TL;DR

This paper introduces a color equivariant architecture that is truly equivariant, which resolves the approximation artifacts of previous methods, improves interpretability and generalizability, and achieves better predictive performance than conventional and equivariant baselines on tasks such as fine-grained classification and medical imaging tasks.

Abstract

When the color distribution of input images changes at inference, the performance of conventional neural network architectures drops considerably. A few researchers have begun to incorporate prior knowledge of color geometry in neural network design. These color equivariant architectures have modeled hue variation with 2D rotations, and saturation and luminance transformations as 1D translations. While this approach improves neural network robustness to color variations in a number of contexts, we find that approximating saturation and luminance (interval valued quantities) as 1D translations introduces appreciable artifacts. In this paper, we introduce a color equivariant architecture that is truly equivariant. Instead of approximating the interval with the real line, we lift values on the interval to values on the circle (a double-cover) and build equivariant representations there. Our approach resolves the approximation artifacts of previous methods, improves interpretability and generalizability, and achieves better predictive performance than conventional and equivariant baselines on tasks such as fine-grained classification and medical imaging tasks. Going beyond the context of color, we show that our proposed lifting can also extend to geometric transformations such as scale.

A Hypertoroidal Covering for Perfect Color Equivariance

TL;DR

This paper introduces a color equivariant architecture that is truly equivariant, which resolves the approximation artifacts of previous methods, improves interpretability and generalizability, and achieves better predictive performance than conventional and equivariant baselines on tasks such as fine-grained classification and medical imaging tasks.

Abstract

When the color distribution of input images changes at inference, the performance of conventional neural network architectures drops considerably. A few researchers have begun to incorporate prior knowledge of color geometry in neural network design. These color equivariant architectures have modeled hue variation with 2D rotations, and saturation and luminance transformations as 1D translations. While this approach improves neural network robustness to color variations in a number of contexts, we find that approximating saturation and luminance (interval valued quantities) as 1D translations introduces appreciable artifacts. In this paper, we introduce a color equivariant architecture that is truly equivariant. Instead of approximating the interval with the real line, we lift values on the interval to values on the circle (a double-cover) and build equivariant representations there. Our approach resolves the approximation artifacts of previous methods, improves interpretability and generalizability, and achieves better predictive performance than conventional and equivariant baselines on tasks such as fine-grained classification and medical imaging tasks. Going beyond the context of color, we show that our proposed lifting can also extend to geometric transformations such as scale.
Paper Structure (46 sections, 43 equations, 15 figures, 8 tables)

This paper contains 46 sections, 43 equations, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Hue, saturation, and luminance lifting. We lift an image with white background with respect to the hue, saturation, and luminance channels. (Top) Hue lifting of $\mathbb{T}^{3}$CEN, which follows the hue lifting proposed in yang2024learning. (Middle) Luminance lifting of $\mathbb{T}^{3}$CEN using a double-cover to give cyclic behavior to the luminance group. As the background is white, the double-cover will yields black/gray backgrounds. (Bottom) Saturation lifting of $\mathbb{T}^{3}$CEN using a double-cover to give cyclic behavior to the saturation group.
  • Figure 2: $\mathbb{T}^{3}$CEN and LCER feature maps under HSL shifts. The features maps of $\mathbb{T}^{3}$CEN are equivariant to shifts in hue, saturation, and luminance, while the feature maps of LCER are only equivariant to shifts in hue. (a) The images are related by a $90^{\circ}$ hue rotation. (b) The images are related by a $0.5$ shift in saturation. (c) The images are related by a $0.5$ shift in luminance. In all cases, because our $\mathbb{T}^{3}$CEN network is equivariant to color shifts (i.e., hue, saturation, and luminance shifts), the feature maps transform predictably (cyclically permuted). Conversely, LCER is only equivariant to hue shifts.
  • Figure 3: Saturation equivariance error. The normalized saturation equivariance error for $\mathbb{T}^{3}$CEN and LCER is reported. $\mathbb{T}^{3}$CEN has average error of $4.66\times10^{-6}$ while LCER has average error of $0.445$.
  • Figure 4: Lifting error comparison. An input images is lifted to the respective saturation group, shifted down by $0.75$, and shifted up by $0.75$. We compare the restored and original input image for $\mathbb{T}^{3}$CEN (top) and LCER (bottom). The average 8-bit integer RGB error is $6.33\times10^{-6}$ for $\mathbb{T}^{3}$CEN and $8.65$ for LCER.
  • Figure 5: Lifting coverage and cases of degenerate lifting.(Top) Coverage of the lifted representation for different input saturation/luminance. The original input value is denoted with black and the first, second, and third lifted representation is in blue, orange, and green. (Bottom) Information entropy of the lifted representation. Cases of degenerate lifting is highlighted vertically in red.
  • ...and 10 more figures