Table of Contents
Fetching ...

Steerable CNNs

Taco S. Cohen, Max Welling

TL;DR

The paper introduces Steerable CNNs, a framework that enforces equivariance to transformation groups through a representation-theoretic approach. By decomposing feature spaces into irreducible types and learning intertwiners between input and output representations, the authors achieve parameter-efficient, scalable equivariant layers via induced representations. Empirical results on CIFAR-10/100 show data-efficient gains and state-of-the-art performance, especially in low-data regimes and with mixed capsule types. The work lays a foundation for extending steerable architectures to continuous groups and broader geometric tasks such as pose and motion estimation.

Abstract

It has long been recognized that the invariance and equivariance properties of a representation are critically important for success in many vision tasks. In this paper we present Steerable Convolutional Neural Networks, an efficient and flexible class of equivariant convolutional networks. We show that steerable CNNs achieve state of the art results on the CIFAR image classification benchmark. The mathematical theory of steerable representations reveals a type system in which any steerable representation is a composition of elementary feature types, each one associated with a particular kind of symmetry. We show how the parameter cost of a steerable filter bank depends on the types of the input and output features, and show how to use this knowledge to construct CNNs that utilize parameters effectively.

Steerable CNNs

TL;DR

The paper introduces Steerable CNNs, a framework that enforces equivariance to transformation groups through a representation-theoretic approach. By decomposing feature spaces into irreducible types and learning intertwiners between input and output representations, the authors achieve parameter-efficient, scalable equivariant layers via induced representations. Empirical results on CIFAR-10/100 show data-efficient gains and state-of-the-art performance, especially in low-data regimes and with mixed capsule types. The work lays a foundation for extending steerable architectures to continuous groups and broader geometric tasks such as pose and motion estimation.

Abstract

It has long been recognized that the invariance and equivariance properties of a representation are critically important for success in many vision tasks. In this paper we present Steerable Convolutional Neural Networks, an efficient and flexible class of equivariant convolutional networks. We show that steerable CNNs achieve state of the art results on the CIFAR image classification benchmark. The mathematical theory of steerable representations reveals a type system in which any steerable representation is a composition of elementary feature types, each one associated with a particular kind of symmetry. We show how the parameter cost of a steerable filter bank depends on the types of the input and output features, and show how to use this knowledge to construct CNNs that utilize parameters effectively.

Paper Structure

This paper contains 14 sections, 20 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Feature maps, fibers, and the transformation law $\pi_0$ of $\mathcal{F}_0$.
  • Figure 2: Diagram showing the structural consistency that follows from equivariance of the network $\Phi$ and the group representation structure of $\pi_0$. The result of following any path in this diagram depends only on the beginning and endpoint but is independent of the path itself, c.f. eq. \ref{['eq:pi_l_rep']}
  • Figure 3: A filter bank $\Psi$ that is $H$-equivariant. In this example, $\rho_1$ represents the $90$-degree rotation $r$ by a permutation matrix that cyclicly shifts the $4$ channels.
  • Figure 4: The representation $\pi_1$ induced from the permutation representation $\rho_1$ shown in fig. \ref{['fig:equivariant_filterbank']}. A single fiber is highlighted. It is transported to a new location, and acted on by $\rho_1$.