Table of Contents
Fetching ...

Multimodal Image Matching based on Frequency-domain Information of Local Energy Response

Meng Yang, Jun Chen, Wenping Gong, Longsheng Wei, Xin Tian

TL;DR

This work tackles the robustness gap in multimodal image matching caused by nonlinear intensity differences, local distortions, and rotations. It introduces FILER, which grounds detection and description in a local energy response model derived from multi-scale, multi-orientation frequency-domain filters, and couples this with an edge-structure enhanced detector and a convolutional feature weighted log-polar descriptor to achieve rotation-invariant matching. The approach demonstrates strong performance across remote sensing, computer vision, and medical datasets, outperforming several state-of-the-art methods in both detection quality and descriptor robustness, and showing resilience to various noise types and rotations. The results suggest FILER’s components can be integrated into trainable architectures to further boost multimodal matching and fusion tasks in diverse practical settings.

Abstract

Complicated nonlinear intensity differences, nonlinear local geometric distortions, noises and rotation transformation are main challenges in multimodal image matching. In order to solve these problems, we propose a method based on Frequency-domain Information of Local Energy Response called FILER. The core of FILER is the local energy response model based on frequency-domain information, which can overcome the effect of nonlinear intensity differences. To improve the robustness to local nonlinear geometric distortions and noises, we design a new edge structure enhanced feature detector and convolutional feature weighted descriptor, respectively. In addition, FILER overcomes the sensitivity of the frequency-domain information to the rotation angle and achieves rotation invariance. Extensive experiments multimodal image pairs show that FILER outperforms other state-of-the-art algorithms and has good robustness and universality.

Multimodal Image Matching based on Frequency-domain Information of Local Energy Response

TL;DR

This work tackles the robustness gap in multimodal image matching caused by nonlinear intensity differences, local distortions, and rotations. It introduces FILER, which grounds detection and description in a local energy response model derived from multi-scale, multi-orientation frequency-domain filters, and couples this with an edge-structure enhanced detector and a convolutional feature weighted log-polar descriptor to achieve rotation-invariant matching. The approach demonstrates strong performance across remote sensing, computer vision, and medical datasets, outperforming several state-of-the-art methods in both detection quality and descriptor robustness, and showing resilience to various noise types and rotations. The results suggest FILER’s components can be integrated into trainable architectures to further boost multimodal matching and fusion tasks in diverse practical settings.

Abstract

Complicated nonlinear intensity differences, nonlinear local geometric distortions, noises and rotation transformation are main challenges in multimodal image matching. In order to solve these problems, we propose a method based on Frequency-domain Information of Local Energy Response called FILER. The core of FILER is the local energy response model based on frequency-domain information, which can overcome the effect of nonlinear intensity differences. To improve the robustness to local nonlinear geometric distortions and noises, we design a new edge structure enhanced feature detector and convolutional feature weighted descriptor, respectively. In addition, FILER overcomes the sensitivity of the frequency-domain information to the rotation angle and achieves rotation invariance. Extensive experiments multimodal image pairs show that FILER outperforms other state-of-the-art algorithms and has good robustness and universality.

Paper Structure

This paper contains 30 sections, 16 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: Feature detection. (a) is the input image pair from remote sensing cross-season image pairs. (b) and (c) are total energy response map and intermediate modal image $I_{out}$ of remote sensing cross-season image, respectively. (d) is the result of feature detection from the input images, and (e) is the result of feature detection from $I_{out}$, where the red points are the outliers without matching, and the blue '$*$' are the inliers successfully matched.
  • Figure 2: The matching results of descriptors constructed based on total energy response and energy manifold vector field. The images in (a) are the total energy response maps of the SAR-optical image pair and the images in (b) are the energy manifold vector fields of them. (c) is the matches established by using (a) as the feature description map, and (d) is the matches established by using (b).
  • Figure 3: The energy manifold vector field diagram and log-polar grid of feature description map. (a) is the remote sensing optical cross-temporal image pair. (b) is the feature description map of (a) and (c) is the manifold energy vector field of (b). (d) is a circular log-polar grid with 36 location bins and each location bin contains a distribution histogram with $N_o$ bins.
  • Figure 4: Sensitivity of feature description map to rotation angle. (a) and (b) are remote sensing cross-season image pair. (c) and (d) are the feature description maps of (a) and (b), respectively. (g) is the feature description map of (b) rotated $90^{\circ}$ counterclockwise. (e) is the difference map between (c) and (d). (h) is the difference map between (c) and (g). (f) is the matching result based on (c) and (d). (i) is the matching result based on (c) and (g).
  • Figure 5: Matching results and difference maps at different angle errors $[0^{\circ}, 5^{\circ}, 10^{\circ}, 15^{\circ}, 20^{\circ}, 25^{\circ}]$. The left of each group of images is the matching result, and the right is the difference map of feature description maps of $50^{\circ}$ rotation angle.
  • ...and 6 more figures