Table of Contents
Fetching ...

Scale-covariant and scale-invariant Gaussian derivative networks

Tony Lindeberg

TL;DR

The paper addresses the challenge that conventional deep networks are not inherently scale-covariant and thus struggle with scale variations in imagery. It proposes a hybrid architecture in which layers are constructed from scale-space primitives—specifically linear combinations of Gaussian derivatives—organized into cascades with shared weights across multiple scale channels; max pooling over these channels yields provable scale invariance. The authors prove scale covariance for the cascade and, under ideal conditions, scale invariance after scale-channel pooling, and validate the approach with single- and multi-scale experiments on MNIST and the MNIST Large Scale dataset, demonstrating robust scale generalization to unseen scales. The work offers a principled, compact parameterization for deep networks that generalizes across scales without heavy data augmentation, with potential impact on robust recognition in real-world, scale-variant settings.

Abstract

This paper presents a hybrid approach between scale-space theory and deep learning, where a deep learning architecture is constructed by coupling parameterized scale-space operations in cascade. By sharing the learnt parameters between multiple scale channels, and by using the transformation properties of the scale-space primitives under scaling transformations, the resulting network becomes provably scale covariant. By in addition performing max pooling over the multiple scale channels, a resulting network architecture for image classification also becomes provably scale invariant. We investigate the performance of such networks on the MNISTLargeScale dataset, which contains rescaled images from original MNIST over a factor of 4 concerning training data and over a factor of 16 concerning testing data. It is demonstrated that the resulting approach allows for scale generalization, enabling good performance for classifying patterns at scales not present in the training data.

Scale-covariant and scale-invariant Gaussian derivative networks

TL;DR

The paper addresses the challenge that conventional deep networks are not inherently scale-covariant and thus struggle with scale variations in imagery. It proposes a hybrid architecture in which layers are constructed from scale-space primitives—specifically linear combinations of Gaussian derivatives—organized into cascades with shared weights across multiple scale channels; max pooling over these channels yields provable scale invariance. The authors prove scale covariance for the cascade and, under ideal conditions, scale invariance after scale-channel pooling, and validate the approach with single- and multi-scale experiments on MNIST and the MNIST Large Scale dataset, demonstrating robust scale generalization to unseen scales. The work offers a principled, compact parameterization for deep networks that generalizes across scales without heavy data augmentation, with potential impact on robust recognition in real-world, scale-variant settings.

Abstract

This paper presents a hybrid approach between scale-space theory and deep learning, where a deep learning architecture is constructed by coupling parameterized scale-space operations in cascade. By sharing the learnt parameters between multiple scale channels, and by using the transformation properties of the scale-space primitives under scaling transformations, the resulting network becomes provably scale covariant. By in addition performing max pooling over the multiple scale channels, a resulting network architecture for image classification also becomes provably scale invariant. We investigate the performance of such networks on the MNISTLargeScale dataset, which contains rescaled images from original MNIST over a factor of 4 concerning training data and over a factor of 16 concerning testing data. It is demonstrated that the resulting approach allows for scale generalization, enabling good performance for classifying patterns at scales not present in the training data.

Paper Structure

This paper contains 19 sections, 24 equations, 9 figures.

Figures (9)

  • Figure 1: Illustration of the importance of having matching support regions of receptive fields when handling scaling transformations in the image domain. For this figure, we have simulated the effect of varying the distance between the object and the camera by varying the amount of zoom for a zoom lens. The left column illustrates the effect of having a fixed receptive field size in the image domain, and how that fixed receptive field size affects the backprojected receptive fields in the world, if there are significant scale variations. In the right column, the receptive field sizes are matched under the scaling transformation, as enabled by scale covariant receptive field families and scale channel networks, which makes it possible to define deep networks that are invariant to scaling transformations, which in turn enables scale generalization.
  • Figure 2: Commutative diagram for a scale-parameterized feature map operator $\Gamma_s$ that is applied to image data under scaling transformations. The commutative diagram, which should be read from the lower left corner to the upper right corner, means that irrespective of whether the input image is first subject to a scaling transformation and then the computation of a feature map, or whether the feature map is computed first and then transformed by a scaling transformation, we should get the same result. Note, however, that this definition of scale covariance assumes a multi-scale representation of the image data, and that direct availability to the image representations at the matching scale levels $s' = S s$ is necessary to complete the commutative diagram.
  • Figure 3: The 2-D Gaussian kernel with its Cartesian partial derivatives up to order two for $\sigma = 4$.
  • Figure 5: Commutative diagram for a scale-covariant Gaussian derivative network constructed by coupling linear combinations of scale-normalized Gaussian derivatives in cascade, with non-linear ReLU stages in between. Because of the transformation properties of the individual layers under scaling transformations, it will be possible to perfectly match the corresponding layers $F_k$ and $F_k'$ under a scaling transformation of the underlying image domain $f'(x') = f(x)$ for $x' = Sx$ and $y' = S y$, provided that the scale parameter $\sigma_k$ in layer $k$ is proportional to the scale parameter $\sigma_1$ in the first layer, $\sigma_k = r_k^2 \, \sigma_1$, for some scalar constant $r_k > 1$. For such a network, the scale parameters in the two domains should be related according to $\sigma_k' = S \sigma_k$. Note, however, that for a scale-discretized implementation, this commutative property holds exactly over a continuous image domain only if the scale levels $\sigma$ and $\sigma'$ are part of the scale grid, thus specifically only for discrete scaling factors $S$ that can be exactly represented on the discrete scale grid. For other scaling factors, the results will instead be numerical approximations, with the accuracy of the approximation determined by a the combination of the network architecture with the learning algorithm. (In this schematic illustration, we have for simplicity suppressed the notation for multiple feature channels in the different layers, and also suppressed the notation for the pointwise non-linearities between adjacent layers.)
  • Figure 6: (left) Schematic illustration of the architecture of the single-scale-channel network, with 6 layers of receptive fields at successively coarser levels of scale. (right) Schematic illustration of the architecture of a multi-scale-channel network, with multiple parallel scale channels over a self-similar distribution of the initial scale level $\sigma_0$ in the hierarchy of Gaussian derivative layers coupled in cascade.
  • ...and 4 more figures