Table of Contents
Fetching ...

Moment kernels: a simple and scalable approach for equivariance to rotations and reflections in deep convolutional networks

Zachary Schlamowitz, Andrew Bennecke, Daniel J. Tward

TL;DR

The paper introduces moment kernels, a simple yet powerful form for achieving rotation and reflection equivariance in deep convolutional networks by treating feature maps as scalar, vector, or tensor fields. Moment kernels are radial functions of $|x|$ multiplied by powers of $x$ or the identity, and the authors prove that all equivariant kernels must take this form, enabling seamless use with standard convolution modules. They provide a complete derivation of equivariant transformation laws, classify kernel types (scalar-to-scalar, scalar-to-vector, vector-to-scalar, vector-to-vector, and higher-order tensors), and show how to construct moment kernels for general tensors. The approach is demonstrated on three biomedical tasks—image classification (DermaMNIST), 3D image registration (MRI), and an elliptical YOLO-based cell detector—where the moment-kernel networks deliver improved worst-case performance, faster convergence, and orientation-consistent results, while remaining interpretable and implementable within conventional CNN frameworks. The work offers a scalable, mathematically grounded alternative to representation-theory based equivariant methods, with meaningful implications for trust and robustness in biomedical imaging applications.

Abstract

The principle of translation equivariance (if an input image is translated an output image should be translated by the same amount), led to the development of convolutional neural networks that revolutionized machine vision. Other symmetries, like rotations and reflections, play a similarly critical role, especially in biomedical image analysis, but exploiting these symmetries has not seen wide adoption. We hypothesize that this is partially due to the mathematical complexity of methods used to exploit these symmetries, which often rely on representation theory, a bespoke concept in differential geometry and group theory. In this work, we show that the same equivariance can be achieved using a simple form of convolution kernels that we call ``moment kernels,'' and prove that all equivariant kernels must take this form. These are a set of radially symmetric functions of a spatial position $x$, multiplied by powers of the components of $x$ or the identity matrix. We implement equivariant neural networks using standard convolution modules, and provide architectures to execute several biomedical image analysis tasks that depend on equivariance principles: classification (outputs are invariant under orthogonal transforms), 3D image registration (outputs transform like a vector), and cell segmentation (quadratic forms defining ellipses transform like a matrix).

Moment kernels: a simple and scalable approach for equivariance to rotations and reflections in deep convolutional networks

TL;DR

The paper introduces moment kernels, a simple yet powerful form for achieving rotation and reflection equivariance in deep convolutional networks by treating feature maps as scalar, vector, or tensor fields. Moment kernels are radial functions of multiplied by powers of or the identity, and the authors prove that all equivariant kernels must take this form, enabling seamless use with standard convolution modules. They provide a complete derivation of equivariant transformation laws, classify kernel types (scalar-to-scalar, scalar-to-vector, vector-to-scalar, vector-to-vector, and higher-order tensors), and show how to construct moment kernels for general tensors. The approach is demonstrated on three biomedical tasks—image classification (DermaMNIST), 3D image registration (MRI), and an elliptical YOLO-based cell detector—where the moment-kernel networks deliver improved worst-case performance, faster convergence, and orientation-consistent results, while remaining interpretable and implementable within conventional CNN frameworks. The work offers a scalable, mathematically grounded alternative to representation-theory based equivariant methods, with meaningful implications for trust and robustness in biomedical imaging applications.

Abstract

The principle of translation equivariance (if an input image is translated an output image should be translated by the same amount), led to the development of convolutional neural networks that revolutionized machine vision. Other symmetries, like rotations and reflections, play a similarly critical role, especially in biomedical image analysis, but exploiting these symmetries has not seen wide adoption. We hypothesize that this is partially due to the mathematical complexity of methods used to exploit these symmetries, which often rely on representation theory, a bespoke concept in differential geometry and group theory. In this work, we show that the same equivariance can be achieved using a simple form of convolution kernels that we call ``moment kernels,'' and prove that all equivariant kernels must take this form. These are a set of radially symmetric functions of a spatial position , multiplied by powers of the components of or the identity matrix. We implement equivariant neural networks using standard convolution modules, and provide architectures to execute several biomedical image analysis tasks that depend on equivariance principles: classification (outputs are invariant under orthogonal transforms), 3D image registration (outputs transform like a vector), and cell segmentation (quadratic forms defining ellipses transform like a matrix).

Paper Structure

This paper contains 56 sections, 46 equations, 3 figures.

Figures (3)

  • Figure 1: Classification performance quantified using area under the rAUC, accuracy, and "worst case accuracy" on the DermaMNIST test set. Our experimental results are shown to the left, including 20 bootstrap samples to indicate variability. In green, results published by MedMNIST are shown.
  • Figure 2: 3D image registration performance. Left: columns show 3 views of the same brain MRI. Left column shows input and right shows desired output in a standard orientation. Second and third columns show the outputs of our model and a comparable standard model. Middle: Accuracy is quantified with 3 different error measures: MSE, worst case MSE, and mean square distance between two examples of the same image viewed at different orientations. 100 samples are also shown to visualize the distribution. Right: training MSE is shown as a function of iteration (epoch).
  • Figure 3: Results of our elliptical YOLO model for cell detection. Panel 1: Example input image and ground truth bounding ellipses, showing three classes via color. Panel 2: Output of our model as filled ellipses. Panels 3, 4: Two outputs of a standard model, viewed by the network in different orientations. White arrows show where detections vary between 3 and 4. Panel 5: Contours superimposed for all eight views, illustrating variability.