Table of Contents
Fetching ...

Steerers: A framework for rotation equivariant keypoint descriptors

Georg Bökman, Johan Edstedt, Michael Felsberg, Fredrik Kahl

TL;DR

The paper addresses the challenge of rotation robustness in learned keypoint descriptors by introducing steerers, linear transforms in description space that encode input image rotations and render descriptors rotation-equivariant with minimal runtime cost. Grounded in representation theory for $C_4$ and $SO(2)$, it presents three training settings—A: fixed steerer with a fixed descriptor, B: joint optimization, C: fixed steerer with descriptor optimization—and demonstrates state-of-the-art rotation-invariant matching on AIMS and Roto-360 while preserving upright performance on MegaDepth. It systematically analyzes the representation-theoretic structure of steerers, explores matching strategies under equivariant descriptions, and investigates the impact of steerer eigenvalues on performance and training dynamics. The practical impact is a scalable, efficient approach to rotation-robust image matching that can be integrated with existing local-feature pipelines, enabling robust 3D reconstruction in challenging, non-upright scenarios.

Abstract

Image keypoint descriptions that are discriminative and matchable over large changes in viewpoint are vital for 3D reconstruction. However, descriptions output by learned descriptors are typically not robust to camera rotation. While they can be made more robust by, e.g., data augmentation, this degrades performance on upright images. Another approach is test-time augmentation, which incurs a significant increase in runtime. Instead, we learn a linear transform in description space that encodes rotations of the input image. We call this linear transform a steerer since it allows us to transform the descriptions as if the image was rotated. From representation theory, we know all possible steerers for the rotation group. Steerers can be optimized (A) given a fixed descriptor, (B) jointly with a descriptor or (C) we can optimize a descriptor given a fixed steerer. We perform experiments in these three settings and obtain state-of-the-art results on the rotation invariant image matching benchmarks AIMS and Roto-360. We publish code and model weights at https://github.com/georg-bn/rotation-steerers.

Steerers: A framework for rotation equivariant keypoint descriptors

TL;DR

The paper addresses the challenge of rotation robustness in learned keypoint descriptors by introducing steerers, linear transforms in description space that encode input image rotations and render descriptors rotation-equivariant with minimal runtime cost. Grounded in representation theory for and , it presents three training settings—A: fixed steerer with a fixed descriptor, B: joint optimization, C: fixed steerer with descriptor optimization—and demonstrates state-of-the-art rotation-invariant matching on AIMS and Roto-360 while preserving upright performance on MegaDepth. It systematically analyzes the representation-theoretic structure of steerers, explores matching strategies under equivariant descriptions, and investigates the impact of steerer eigenvalues on performance and training dynamics. The practical impact is a scalable, efficient approach to rotation-robust image matching that can be integrated with existing local-feature pipelines, enabling robust 3D reconstruction in challenging, non-upright scenarios.

Abstract

Image keypoint descriptions that are discriminative and matchable over large changes in viewpoint are vital for 3D reconstruction. However, descriptions output by learned descriptors are typically not robust to camera rotation. While they can be made more robust by, e.g., data augmentation, this degrades performance on upright images. Another approach is test-time augmentation, which incurs a significant increase in runtime. Instead, we learn a linear transform in description space that encodes rotations of the input image. We call this linear transform a steerer since it allows us to transform the descriptions as if the image was rotated. From representation theory, we know all possible steerers for the rotation group. Steerers can be optimized (A) given a fixed descriptor, (B) jointly with a descriptor or (C) we can optimize a descriptor given a fixed steerer. We perform experiments in these three settings and obtain state-of-the-art results on the rotation invariant image matching benchmarks AIMS and Roto-360. We publish code and model weights at https://github.com/georg-bn/rotation-steerers.
Paper Structure (28 sections, 4 theorems, 20 equations, 8 figures, 6 tables)

This paper contains 28 sections, 4 theorems, 20 equations, 8 figures, 6 tables.

Key Result

Theorem 4.1

Let $\rho$ be a representation of $C_4$ on $\mathbb{R}^D$. Then, there exists an invertible matrix $Q$ and $j_d\in\{0, 1, 2, 3\}$ such that

Figures (8)

  • Figure 1: Matching under large in-plane rotations. Two challenging pairs from AIMS stoken2023astronaut. The left images in each pair were taken by astronauts on the ISS and are geo-referenced by matching them with the satellite images on the right. We plot estimated inlier correspondences after homography estimation with RANSAC. Further qualitative examples are shown in the appendix.
  • Figure 2: Overview of approach. A steerer (Definition \ref{['def:steerer']}) is a linear map that transforms the description of a keypoint into the description of the corresponding keypoint in a rotated image. Thus, a steerer makes the keypoint descriptor rotation equivariant, and we can obtain the descriptions of keypoints in arbitrarily rotated images while only running the descriptor once.
  • Figure 3: Equivariance of Upright SIFT. Left: A keypoint with its Upright SIFT description in an upright image and a rotated version. The small yellow squares are the subregions where histograms of gradient orientations are computed. Right: The Upright SIFT descriptions unravelled into the 128 bin histograms that constitute them. When we rotate the image, the subregions are permuted, and the histogram bins within each subregion are further permuted cyclically. Hence, Upright SIFT is rotation equivariant.
  • Figure 4: Training evolution of eigenvalue distributions of steerers. We plot the eigenvalue distribution of $C_4$-steerers $\rho(\mathbf{g})$ (first three columns) and Lie algebra generators $\mathrm{d}\varsigma$ for $\mathrm{SO}(2)$-steerers (last two columns) in the complex plane, with different initializations when trained jointly with a descriptor. The top row depicts the eigenvalues at the start, and the bottom row at the end of training. There are $D=256$ eigenvalues in every plot---many congregate at the "admissible" eigenvalues as described in Section \ref{['sec:equiv_steer']}---but some do not, see the discussion in Section \ref{['sec:training_dynamics']}. These visualizations highlight the initialization sensitivity of the steerer. We show gif movies of the training evolution at https://github.com/georg-bn/rotation-steerers.
  • Figure 5: More qualitative challenging matching examples from the AIMS data.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Definition 3.1
  • Example 3.1
  • Example 3.2
  • Example 4.1
  • Definition 4.1: Equivariance
  • Definition 4.2: Equivariance of keypoint descriptor
  • Example 4.2
  • Definition 4.3: Steerability, adapted from freeman1991design
  • Definition 4.4: Steerer
  • Example 4.3
  • ...and 7 more