Table of Contents
Fetching ...

REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching

Han Nie, Bin Luo, Jun Liu, Zhitao Fu, Weixing Liu, Xin Su

TL;DR

REMM tackles the challenge of multimodal image matching under arbitrary rotations and scales by first learning modal-invariant features and then encoding rotation into descriptors with a novel cyclic shift module. The framework comprises a multimodal feature learning module and a rotation-encoding cyclic shift module, trained with a joint loss that includes keypoint detection and shift-descriptor objectives. A new rotation- and scale-invariant benchmark across OPT-SAR and OPT-NIR datasets demonstrates REMM’s superior performance and strong generalization to independent data. The results indicate REMM provides robust, end-to-end multimodal matching with practical implications for remote sensing and cross-modal localization, outperforming both traditional and learning-based baselines in rotation-equivariant scenarios.

Abstract

We present REMM, a rotation-equivariant framework for end-to-end multimodal image matching, which fully encodes rotational differences of descriptors in the whole matching pipeline. Previous learning-based methods mainly focus on extracting modal-invariant descriptors, while consistently ignoring the rotational invariance. In this paper, we demonstrate that our REMM is very useful for multimodal image matching, including multimodal feature learning module and cyclic shift module. We first learn modal-invariant features through the multimodal feature learning module. Then, we design the cyclic shift module to rotationally encode the descriptors, greatly improving the performance of rotation-equivariant matching, which makes them robust to any angle. To validate our method, we establish a comprehensive rotation and scale-matching benchmark for evaluating the anti-rotation performance of multimodal images, which contains a combination of multi-angle and multi-scale transformations from four publicly available datasets. Extensive experiments show that our method outperforms existing methods in benchmarking and generalizes well to independent datasets. Additionally, we conducted an in-depth analysis of the key components of the REMM to validate the improvements brought about by the cyclic shift module. Code and dataset at https://github.com/HanNieWHU/REMM.

REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching

TL;DR

REMM tackles the challenge of multimodal image matching under arbitrary rotations and scales by first learning modal-invariant features and then encoding rotation into descriptors with a novel cyclic shift module. The framework comprises a multimodal feature learning module and a rotation-encoding cyclic shift module, trained with a joint loss that includes keypoint detection and shift-descriptor objectives. A new rotation- and scale-invariant benchmark across OPT-SAR and OPT-NIR datasets demonstrates REMM’s superior performance and strong generalization to independent data. The results indicate REMM provides robust, end-to-end multimodal matching with practical implications for remote sensing and cross-modal localization, outperforming both traditional and learning-based baselines in rotation-equivariant scenarios.

Abstract

We present REMM, a rotation-equivariant framework for end-to-end multimodal image matching, which fully encodes rotational differences of descriptors in the whole matching pipeline. Previous learning-based methods mainly focus on extracting modal-invariant descriptors, while consistently ignoring the rotational invariance. In this paper, we demonstrate that our REMM is very useful for multimodal image matching, including multimodal feature learning module and cyclic shift module. We first learn modal-invariant features through the multimodal feature learning module. Then, we design the cyclic shift module to rotationally encode the descriptors, greatly improving the performance of rotation-equivariant matching, which makes them robust to any angle. To validate our method, we establish a comprehensive rotation and scale-matching benchmark for evaluating the anti-rotation performance of multimodal images, which contains a combination of multi-angle and multi-scale transformations from four publicly available datasets. Extensive experiments show that our method outperforms existing methods in benchmarking and generalizes well to independent datasets. Additionally, we conducted an in-depth analysis of the key components of the REMM to validate the improvements brought about by the cyclic shift module. Code and dataset at https://github.com/HanNieWHU/REMM.
Paper Structure (27 sections, 15 equations, 13 figures, 9 tables)

This paper contains 27 sections, 15 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: On our test benchmark consisting of 33,180 pairs of images across four datasets, REMM achieved notable results. Represented by matching success rate (SR) and RMSE on the horizontal and vertical axes respectively, REMM demonstrated the second-best matching success rate, the highest NCM, and the lowest RMSE. These findings underscore the significant performance of our method.
  • Figure 2: Our proposed REMM framework consists of multimodal feature learning module and a cyclic shift module.
  • Figure 3: Our proposed REMM framework consists of multimodal feature learning module and a cyclic shift module.
  • Figure 4: The training phases of our method REMM.
  • Figure 5: The testing phases of our method REMM.
  • ...and 8 more figures