Table of Contents
Fetching ...

3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data

Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, Taco Cohen

TL;DR

This work introduces 3D Steerable CNNs that achieve SE($\,3$)-equivariance by representing inputs as fields over $\mathbb{R}^3$ and learning kernels that are expressed as linear combinations of analytically derived steerable basises using irreducible $\operatorname{SO}(3)$ representations. By enforcing an equivariance constraint on kernels, decomposing tensor products into irreps, and employing spherical harmonics with radial basis functions, the model attains universal equivariant linear maps between field types. The approach yields markedly data-efficient models that outperform conventional 3D CNNs on tasks with inherent $\mathrm{SE}(3)$ symmetry, such as amino acid environment prediction and protein-structure classification, while drastically reducing parameter count (e.g., from tens of millions to a few hundred thousand). The framework integrates discretization strategies, equivariant nonlinearities, and precomputation tricks to enable practical training and inference, with demonstrated success on Tetris-like rotation tasks, SHREC17 model classification, and CATH protein-architecture classification. Overall, the work provides both a solid theoretical foundation and a scalable, plug-inable architectural paradigm for rotationally equivariant learning in volumetric data, with significant implications for structural biology and other 3D domains.

Abstract

We present a convolutional network that is equivariant to rigid body motions. The model uses scalar-, vector-, and tensor fields over 3D Euclidean space to represent data, and equivariant convolutions to map between such representations. These SE(3)-equivariant convolutions utilize kernels which are parameterized as a linear combination of a complete steerable kernel basis, which is derived analytically in this paper. We prove that equivariant convolutions are the most general equivariant linear maps between fields over R^3. Our experimental results confirm the effectiveness of 3D Steerable CNNs for the problem of amino acid propensity prediction and protein structure classification, both of which have inherent SE(3) symmetry.

3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data

TL;DR

This work introduces 3D Steerable CNNs that achieve SE()-equivariance by representing inputs as fields over and learning kernels that are expressed as linear combinations of analytically derived steerable basises using irreducible representations. By enforcing an equivariance constraint on kernels, decomposing tensor products into irreps, and employing spherical harmonics with radial basis functions, the model attains universal equivariant linear maps between field types. The approach yields markedly data-efficient models that outperform conventional 3D CNNs on tasks with inherent symmetry, such as amino acid environment prediction and protein-structure classification, while drastically reducing parameter count (e.g., from tens of millions to a few hundred thousand). The framework integrates discretization strategies, equivariant nonlinearities, and precomputation tricks to enable practical training and inference, with demonstrated success on Tetris-like rotation tasks, SHREC17 model classification, and CATH protein-architecture classification. Overall, the work provides both a solid theoretical foundation and a scalable, plug-inable architectural paradigm for rotationally equivariant learning in volumetric data, with significant implications for structural biology and other 3D domains.

Abstract

We present a convolutional network that is equivariant to rigid body motions. The model uses scalar-, vector-, and tensor fields over 3D Euclidean space to represent data, and equivariant convolutions to map between such representations. These SE(3)-equivariant convolutions utilize kernels which are parameterized as a linear combination of a complete steerable kernel basis, which is derived analytically in this paper. We prove that equivariant convolutions are the most general equivariant linear maps between fields over R^3. Our experimental results confirm the effectiveness of 3D Steerable CNNs for the problem of amino acid propensity prediction and protein structure classification, both of which have inherent SE(3) symmetry.

Paper Structure

This paper contains 35 sections, 2 theorems, 22 equations, 6 figures, 4 tables.

Key Result

Lemma 1

The map $f \mapsto \kappa \cdot f$ is equivariant if and only if for all $g \in \operatorname{SE}(3)$,

Figures (6)

  • Figure 1: To transform a vector field (L) by a $90\degree$ rotation $g$, first move each arrow to its new position (C), keeping its orientation the same, then rotate the vector itself (R). This is described by the induced representation $\pi = \operatorname{Ind}_{\operatorname{SO}(3)}^{\operatorname{SE}(2)} \rho$, where $\rho(g)$ is a $3 \times 3$ rotation matrix that mixes the three coordinate channels.
  • Figure 2: Angular part of the basis for the space of steerable kernels $\kappa^{jl}$ (for $j=l=1$, i.e. 3D vector fields as input and output). From left to right we plot three $3 \times 3$ matrices, for $j-l \leq J \leq j+l$ i.e. $J=0, 1, 2$. Each $3 \times 3$ matrix corresponds to one learnable parameter per radial basis function $\varphi^m$. A seasoned eye will see the identity, the curl ($\nabla \wedge$) and the gradient of the divergence ($\nabla \nabla \cdot$).
  • Figure 3: Shrec17 resultsFuruya2016Esteves2018Tatsuma2009s.2018sphericalBai_2016_CVPRkanezaki2018_rotationnetshrec17. Comparison of different architectures by number of parameters and score. See Table \ref{['tab:shrec17']} in the Supplementary Material for all the details.
  • Figure 4: Accuracy on the CATH test set as a function of increasing reduction in training set size.
  • Figure 5: A gated nonlinearity requires one extra scalar field (represented by gray circles with an $I$) per nonscalar output fields (represented by circles with a $\rho$). Specifically, the number of scalar output channels for the preceding convolution operator is increased by the number of features acted on by gated nonlinearities, and the extra scalar fields are computed in the same way as any other scalar field. We use sigmoid for the gate fields. In this picture, there is one scalar field in the output. It is activated with a ReLU.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Theorem 2
  • proof