Table of Contents
Fetching ...

Harmformer: Harmonic Networks Meet Transformers for Continuous Roto-Translation Equivariance

Tomáš Karella, Adam Harmanec, Jan Kotera, Jan Blažek, Filip Šroubek

TL;DR

The Harmformer is introduced, a harmonic transformer with a convolutional stem that achieves equivariance for both translation and continuous rotation, and demonstrates inherent stability under any continuous rotation, even without seeing rotated samples during training.

Abstract

CNNs exhibit inherent equivariance to image translation, leading to efficient parameter and data usage, faster learning, and improved robustness. The concept of translation equivariant networks has been successfully extended to rotation transformation using group convolution for discrete rotation groups and harmonic functions for the continuous rotation group encompassing $360^\circ$. We explore the compatibility of the SA mechanism with full rotation equivariance, in contrast to previous studies that focused on discrete rotation. We introduce the Harmformer, a harmonic transformer with a convolutional stem that achieves equivariance for both translation and continuous rotation. Accompanied by an end-to-end equivariance proof, the Harmformer not only outperforms previous equivariant transformers, but also demonstrates inherent stability under any continuous rotation, even without seeing rotated samples during training.

Harmformer: Harmonic Networks Meet Transformers for Continuous Roto-Translation Equivariance

TL;DR

The Harmformer is introduced, a harmonic transformer with a convolutional stem that achieves equivariance for both translation and continuous rotation, and demonstrates inherent stability under any continuous rotation, even without seeing rotated samples during training.

Abstract

CNNs exhibit inherent equivariance to image translation, leading to efficient parameter and data usage, faster learning, and improved robustness. The concept of translation equivariant networks has been successfully extended to rotation transformation using group convolution for discrete rotation groups and harmonic functions for the continuous rotation group encompassing . We explore the compatibility of the SA mechanism with full rotation equivariance, in contrast to previous studies that focused on discrete rotation. We introduce the Harmformer, a harmonic transformer with a convolutional stem that achieves equivariance for both translation and continuous rotation. Accompanied by an end-to-end equivariance proof, the Harmformer not only outperforms previous equivariant transformers, but also demonstrates inherent stability under any continuous rotation, even without seeing rotated samples during training.

Paper Structure

This paper contains 32 sections, 13 theorems, 41 equations, 11 figures, 10 tables.

Key Result

Lemma 4.1

Let $I$ be an input image and $W_{m_1}$ a harmonic filter. Under image rotation by angle $\alpha$, convolution of $I$ with $W_{m_1}$ is given by: This equation shows that rotating the image only results in a phase shift of the feature values, while the spatial coordinates are rotated accordingly. This property also holds for subsequent convolution layers. If the first feature map is denoted as $F

Figures (11)

  • Figure 1: Equivariance of the Harmformer feature and attention maps in response to rotation of the input image: While the maps themselves are rotated, the magnitudes in the maps remain the same.
  • Figure 2: Overview of the Harmformer architecture, divided into four stages: S1 - downscaling the input, S2 - constructing patches from feature maps, S3 - Harmonic Encoder, and S4 - Classifier.
  • Figure 3: (a) Phase shift of he feature values when the input is rotated; (b) Harmonic Convolution (H-Conv) Block of the stem stage; (c) Interaction of harmonic filters $W_m$ with feature maps $F_m$ within the Harmonic Convolution layer of the H-Conv Block, where $m$ is the rotation order.
  • Figure 4: a) Construction of the patches (colors represent rotation orders) and the Harmonic Encoder structure. b) Diagram depicting the interaction of SA mechanisms across different rotation orders.
  • Figure 5: Ablation study on different normalization layers. The rows represent different normalization layers in the H-Conv block. Each plot is aggregated from 5 different runs. The error bars represent the standard deviation.
  • ...and 6 more figures

Theorems & Definitions (28)

  • Definition 3.1: Equivariance
  • Definition 3.2: Self-Attention
  • Definition 4.1: Harmonic Filter
  • Lemma 4.1: Harmonic Convolution Property
  • Definition 4.2: Harmonic Equivariance -- HE
  • Lemma 5.1: he of Residual Connections
  • Lemma 5.2: he of Linear Layer
  • Lemma 5.3: HE of Layer Norm
  • Lemma 5.4: Dot product subtracts rotation orders
  • Lemma 5.5: Matrix multiplication sums rotation orders
  • ...and 18 more