Multimodal Image Matching based on Frequency-domain Information of Local Energy Response
Meng Yang, Jun Chen, Wenping Gong, Longsheng Wei, Xin Tian
TL;DR
This work tackles the robustness gap in multimodal image matching caused by nonlinear intensity differences, local distortions, and rotations. It introduces FILER, which grounds detection and description in a local energy response model derived from multi-scale, multi-orientation frequency-domain filters, and couples this with an edge-structure enhanced detector and a convolutional feature weighted log-polar descriptor to achieve rotation-invariant matching. The approach demonstrates strong performance across remote sensing, computer vision, and medical datasets, outperforming several state-of-the-art methods in both detection quality and descriptor robustness, and showing resilience to various noise types and rotations. The results suggest FILER’s components can be integrated into trainable architectures to further boost multimodal matching and fusion tasks in diverse practical settings.
Abstract
Complicated nonlinear intensity differences, nonlinear local geometric distortions, noises and rotation transformation are main challenges in multimodal image matching. In order to solve these problems, we propose a method based on Frequency-domain Information of Local Energy Response called FILER. The core of FILER is the local energy response model based on frequency-domain information, which can overcome the effect of nonlinear intensity differences. To improve the robustness to local nonlinear geometric distortions and noises, we design a new edge structure enhanced feature detector and convolutional feature weighted descriptor, respectively. In addition, FILER overcomes the sensitivity of the frequency-domain information to the rotation angle and achieves rotation invariance. Extensive experiments multimodal image pairs show that FILER outperforms other state-of-the-art algorithms and has good robustness and universality.
