Steerable CNNs
Taco S. Cohen, Max Welling
TL;DR
The paper introduces Steerable CNNs, a framework that enforces equivariance to transformation groups through a representation-theoretic approach. By decomposing feature spaces into irreducible types and learning intertwiners between input and output representations, the authors achieve parameter-efficient, scalable equivariant layers via induced representations. Empirical results on CIFAR-10/100 show data-efficient gains and state-of-the-art performance, especially in low-data regimes and with mixed capsule types. The work lays a foundation for extending steerable architectures to continuous groups and broader geometric tasks such as pose and motion estimation.
Abstract
It has long been recognized that the invariance and equivariance properties of a representation are critically important for success in many vision tasks. In this paper we present Steerable Convolutional Neural Networks, an efficient and flexible class of equivariant convolutional networks. We show that steerable CNNs achieve state of the art results on the CIFAR image classification benchmark. The mathematical theory of steerable representations reveals a type system in which any steerable representation is a composition of elementary feature types, each one associated with a particular kind of symmetry. We show how the parameter cost of a steerable filter bank depends on the types of the input and output features, and show how to use this knowledge to construct CNNs that utilize parameters effectively.
