Table of Contents
Fetching ...

Riesz feature representation: scale equivariant scattering network for classification tasks

Tin Barisin, Jesus Angulo, Katja Schladitz, Claudia Redenbach

TL;DR

The paper tackles scale sensitivity in traditional scattering descriptors by introducing a Riesz-transform–based feature representation that is scale-equivariant and avoids explicit scale sampling. It constructs a hierarchical, nonexpansive representation from first- and higher-order Riesz transforms and a steerable base function, culminating in a compact 85-feature descriptor that achieves scale generalization through global pooling. Empirical results on MNIST Large Scale, KTH-tips, and CIFAR-10 demonstrate robust performance under unseen scales and competitive texture/digit classification, while highlighting advantages in data efficiency and stability over purely data-driven deep nets. The work also points to promising hybrid integrations with CNNs to combine discriminative power with scale-robustness, and outlines avenues for scale-aware bounding boxes and extended applications.

Abstract

Scattering networks yield powerful and robust hierarchical image descriptors which do not require lengthy training and which work well with very few training data. However, they rely on sampling the scale dimension. Hence, they become sensitive to scale variations and are unable to generalize to unseen scales. In this work, we define an alternative feature representation based on the Riesz transform. We detail and analyze the mathematical foundations behind this representation. In particular, it inherits scale equivariance from the Riesz transform and completely avoids sampling of the scale dimension. Additionally, the number of features in the representation is reduced by a factor four compared to scattering networks. Nevertheless, our representation performs comparably well for texture classification with an interesting addition: scale equivariance. Our method yields superior performance when dealing with scales outside of those covered by the training dataset. The usefulness of the equivariance property is demonstrated on the digit classification task, where accuracy remains stable even for scales four times larger than the one chosen for training. As a second example, we consider classification of textures.

Riesz feature representation: scale equivariant scattering network for classification tasks

TL;DR

The paper tackles scale sensitivity in traditional scattering descriptors by introducing a Riesz-transform–based feature representation that is scale-equivariant and avoids explicit scale sampling. It constructs a hierarchical, nonexpansive representation from first- and higher-order Riesz transforms and a steerable base function, culminating in a compact 85-feature descriptor that achieves scale generalization through global pooling. Empirical results on MNIST Large Scale, KTH-tips, and CIFAR-10 demonstrate robust performance under unseen scales and competitive texture/digit classification, while highlighting advantages in data efficiency and stability over purely data-driven deep nets. The work also points to promising hybrid integrations with CNNs to combine discriminative power with scale-robustness, and outlines avenues for scale-aware bounding boxes and extended applications.

Abstract

Scattering networks yield powerful and robust hierarchical image descriptors which do not require lengthy training and which work well with very few training data. However, they rely on sampling the scale dimension. Hence, they become sensitive to scale variations and are unable to generalize to unseen scales. In this work, we define an alternative feature representation based on the Riesz transform. We detail and analyze the mathematical foundations behind this representation. In particular, it inherits scale equivariance from the Riesz transform and completely avoids sampling of the scale dimension. Additionally, the number of features in the representation is reduced by a factor four compared to scattering networks. Nevertheless, our representation performs comparably well for texture classification with an interesting addition: scale equivariance. Our method yields superior performance when dealing with scales outside of those covered by the training dataset. The usefulness of the equivariance property is demonstrated on the digit classification task, where accuracy remains stable even for scales four times larger than the one chosen for training. As a second example, we consider classification of textures.
Paper Structure (31 sections, 5 theorems, 43 equations, 7 figures, 9 tables)

This paper contains 31 sections, 5 theorems, 43 equations, 7 figures, 9 tables.

Key Result

Theorem 1

The N-th order Riesz transform achieves the following decomposition:

Figures (7)

  • Figure 1: The 10 classes of the MNIST Large Scale data set. All images have $112 \times 112$ pixels.
  • Figure 2: Variation of scales in the MNIST Large Scale data set (from left to right): scales 0.5, 1, 2, 4, and 8. All images have $112 \times 112$ pixels.
  • Figure 3: Extracting the bounding box (right) from the input image (left). The input image consists of $112\times112$ pixels, the bounding box of $16\times24$ pixels.
  • Figure 4: Sample classes in KTH-tips dataset.
  • Figure 5: Scale variation in the KTH-tips dataset for the aluminium foil sample.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof