Table of Contents
Fetching ...

DRKF: Distilled Rotated Kernel Fusion for Efficient Rotation Invariant Descriptors in Local Feature Matching

Ranran Huang, Jiancheng Cai, Chao Li, Zhuoyuan Wu, Xinmin Liu, Zhenhua Chai

TL;DR

This work tackles the challenge of rotation variation in local feature matching by introducing Rotated Kernel Fusion (RKF), which rotates and fuses kernels to embed rotation invariance directly into CNNs, and Multi-oriented Feature Aggregation (MOFA) as a training-time teacher to further boost robustness. A knowledge-distillation framework yields the distilled DRKF model, with re-parameterization fused kernels enabling the same inference cost as standard CNNs. The method exhibits strong rotation robustness on rotated HPatches and achieves state-of-the-art Mean Average Accuracy on the DiverseBEV aerial dataset, while maintaining practical efficiency on embedded hardware. Overall, DRKF provides a practical, rotation-invariant descriptor learning approach that generalizes across viewpoints and rotations with efficient deployment.

Abstract

The performance of local feature descriptors degrades in the presence of large rotation variations. To address this issue, we present an efficient approach to learning rotation invariant descriptors. Specifically, we propose Rotated Kernel Fusion (RKF) which imposes rotations on the convolution kernel to improve the inherent nature of CNN. Since RKF can be processed by the subsequent re-parameterization, no extra computational costs will be introduced in the inference stage. Moreover, we present Multi-oriented Feature Aggregation (MOFA) which aggregates features extracted from multiple rotated versions of the input image and can provide auxiliary knowledge for the training of RKF by leveraging the distillation strategy. We refer to the distilled RKF model as DRKF. Besides the evaluation on a rotation-augmented version of the public dataset HPatches, we also contribute a new dataset named DiverseBEV which is collected during the drone's flight and consists of bird's eye view images with large viewpoint changes and camera rotations. Extensive experiments show that our method can outperform other state-of-the-art techniques when exposed to large rotation variations.

DRKF: Distilled Rotated Kernel Fusion for Efficient Rotation Invariant Descriptors in Local Feature Matching

TL;DR

This work tackles the challenge of rotation variation in local feature matching by introducing Rotated Kernel Fusion (RKF), which rotates and fuses kernels to embed rotation invariance directly into CNNs, and Multi-oriented Feature Aggregation (MOFA) as a training-time teacher to further boost robustness. A knowledge-distillation framework yields the distilled DRKF model, with re-parameterization fused kernels enabling the same inference cost as standard CNNs. The method exhibits strong rotation robustness on rotated HPatches and achieves state-of-the-art Mean Average Accuracy on the DiverseBEV aerial dataset, while maintaining practical efficiency on embedded hardware. Overall, DRKF provides a practical, rotation-invariant descriptor learning approach that generalizes across viewpoints and rotations with efficient deployment.

Abstract

The performance of local feature descriptors degrades in the presence of large rotation variations. To address this issue, we present an efficient approach to learning rotation invariant descriptors. Specifically, we propose Rotated Kernel Fusion (RKF) which imposes rotations on the convolution kernel to improve the inherent nature of CNN. Since RKF can be processed by the subsequent re-parameterization, no extra computational costs will be introduced in the inference stage. Moreover, we present Multi-oriented Feature Aggregation (MOFA) which aggregates features extracted from multiple rotated versions of the input image and can provide auxiliary knowledge for the training of RKF by leveraging the distillation strategy. We refer to the distilled RKF model as DRKF. Besides the evaluation on a rotation-augmented version of the public dataset HPatches, we also contribute a new dataset named DiverseBEV which is collected during the drone's flight and consists of bird's eye view images with large viewpoint changes and camera rotations. Extensive experiments show that our method can outperform other state-of-the-art techniques when exposed to large rotation variations.
Paper Structure (32 sections, 16 equations, 7 figures, 3 tables)

This paper contains 32 sections, 16 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The figure shows the framework of our method. We follow the detection-and-description framework to jointly optimize the detection and description objectives. The RKF model replaces the regular convolution with RKF convolution and improves the inherent nature of CNN (see details in Fig. \ref{['RKF']}). MOFA integrates the features extracted from multiple rotated versions of images and serves as the teacher model to provide auxiliary supervision to the RKF model. The distillation stage generates the distilled RKF model (DRKF).
  • Figure 2: The structure of RKF convolution in the case of $3\times3$ kernel. RKF imposes kernel rotation on the original kernel, and fuses features extracted by multi-oriented kernels. Re-parameterization can be further leveraged to convert the multiple kernels to one single kernel.
  • Figure 3: Visualized explanation of MOFA in the case of four rotation transformations on the input pair of images. (a) The two original corresponding keypoints $a$ and $b$ with an orientation gap of $\alpha$. (b) The transformed corresponding keypoints on two images under four rotation transformations. The orientations of $a$ and $b$ are transformed to $a_0$, $a_1$, $a_2$, $a_3$ and $b_0$, $b_1$, $b_2$, $b_3$, respectively, generating multiple correspondence pairs with an orientation gap of $\beta$. Instead of matching the two original keypoints with an orientation gap of $\alpha$ on two images, the model only needs to consider several correspondence pairs with a smaller gap of $\beta$ on the transformed keypoints.
  • Figure 4: The flight trajectory of the DiverseBEV dataset.
  • Figure 5: Mean average accuracy (MAA) curves on the DiverseBEV dataset. The left shows the MAA of image pairs satisfying both angular and distance thresholds. The middle and right are the MAA as a function of the angular threshold and distance threshold, respectively. Our DRKF outperforms other SOTA methods on the DiverseBEV dataset.
  • ...and 2 more figures