Table of Contents
Fetching ...

PRISM: PRogressive dependency maxImization for Scale-invariant image Matching

Xudong Cai, Yongcai Wang, Lun Luo, Minhang Wang, Deying Li, Jintao Xu, Weihao Gu, Rui Ai

TL;DR

PRISM tackles detector-free image matching by introducing progressive patch pruning and scale-aware attention. It progressively prunes irrelevant patches using Multi-scale Pruning Modules (MPM) that maximize cross-image dependency via Normalized Mutual Information, and employs Scale-Aware Dynamic Pruning Attention (SADPA) to fuse multi-scale context with reduced interference. The method yields state-of-the-art results across homography estimation, relative pose estimation, and visual localization, while demonstrating strong generalization and efficiency. This approach advances robust dense matching by focusing computation on informative regions and modeling scale variations within a unified attention framework.

Abstract

Image matching aims at identifying corresponding points between a pair of images. Currently, detector-free methods have shown impressive performance in challenging scenarios, thanks to their capability of generating dense matches and global receptive field. However, performing feature interaction and proposing matches across the entire image is unnecessary, because not all image regions contribute to the matching process. Interacting and matching in unmatchable areas can introduce errors, reducing matching accuracy and efficiency. Meanwhile, the scale discrepancy issue still troubles existing methods. To address above issues, we propose PRogressive dependency maxImization for Scale-invariant image Matching (PRISM), which jointly prunes irrelevant patch features and tackles the scale discrepancy. To do this, we firstly present a Multi-scale Pruning Module (MPM) to adaptively prune irrelevant features by maximizing the dependency between the two feature sets. Moreover, we design the Scale-Aware Dynamic Pruning Attention (SADPA) to aggregate information from different scales via a hierarchical design. Our method's superior matching performance and generalization capability are confirmed by leading accuracy across various evaluation benchmarks and downstream tasks. The code is publicly available at https://github.com/Master-cai/PRISM.

PRISM: PRogressive dependency maxImization for Scale-invariant image Matching

TL;DR

PRISM tackles detector-free image matching by introducing progressive patch pruning and scale-aware attention. It progressively prunes irrelevant patches using Multi-scale Pruning Modules (MPM) that maximize cross-image dependency via Normalized Mutual Information, and employs Scale-Aware Dynamic Pruning Attention (SADPA) to fuse multi-scale context with reduced interference. The method yields state-of-the-art results across homography estimation, relative pose estimation, and visual localization, while demonstrating strong generalization and efficiency. This approach advances robust dense matching by focusing computation on informative regions and modeling scale variations within a unified attention framework.

Abstract

Image matching aims at identifying corresponding points between a pair of images. Currently, detector-free methods have shown impressive performance in challenging scenarios, thanks to their capability of generating dense matches and global receptive field. However, performing feature interaction and proposing matches across the entire image is unnecessary, because not all image regions contribute to the matching process. Interacting and matching in unmatchable areas can introduce errors, reducing matching accuracy and efficiency. Meanwhile, the scale discrepancy issue still troubles existing methods. To address above issues, we propose PRogressive dependency maxImization for Scale-invariant image Matching (PRISM), which jointly prunes irrelevant patch features and tackles the scale discrepancy. To do this, we firstly present a Multi-scale Pruning Module (MPM) to adaptively prune irrelevant features by maximizing the dependency between the two feature sets. Moreover, we design the Scale-Aware Dynamic Pruning Attention (SADPA) to aggregate information from different scales via a hierarchical design. Our method's superior matching performance and generalization capability are confirmed by leading accuracy across various evaluation benchmarks and downstream tasks. The code is publicly available at https://github.com/Master-cai/PRISM.
Paper Structure (46 sections, 18 equations, 8 figures, 12 tables, 1 algorithm)

This paper contains 46 sections, 18 equations, 8 figures, 12 tables, 1 algorithm.

Figures (8)

  • Figure 1: The basic idea of our proposed methods. Given two images, not all image patches are helpful to the matching process. Conducting feature interactions and searching matches across the entire image can be detrimental (Without Pruning). We propose to gradually prune the irrelevant patches by maximizing the dependency between two images, resulting in more robust and accurate matches (With Pruning). (c) shows the pruning masks estimated by successive MPM. MPM prunes irrelevant patches from shallow to deep layers successively $\blacksquare$$\rightarrow$$\blacksquare$$\rightarrow$$\blacksquare$$\rightarrow$$\blacksquare$. Feature interactions and match searches are only conducted in the white mask regions.
  • Figure 2: Overview of PRISM. PRISM starts from a CNN-based backbone to extract coarse-level $F^A_c, F^B_c$ and fine-level features $F^A_f, F^B_f$. $F^A_c, F^B_c$ are fed into the proposed iterative Multi-scale Pruning Module (MPM) for updating and pruning (Sec. \ref{['sec: MPM']}). In each MPM layer, the features are first transformed by the self- and cross- SADPA with a hierarchical design (Sec. \ref{['sec: SADPA']}) to aggregate information from selected features of various scales. Then the Patch Pruning module (Sec. \ref{['sec: PP']}) eliminates irrelevant features to maximize the NMI between the two feature sets. After $L$ MPM blocks, the final features $F_L^A$ and $F_L^B$ are used to acquire the coarse matching Matrix by Weighted Dual-softmax (Sec. \ref{['sec: Matrix']}). Finally, we use the mutual nearest neighbor strategy and the threshold $\theta_c$ to filter the valid coarse matches $\mathcal{M}_c$. Then $\mathcal{M}_c$ are projected to fine level features maps $F^A_f, F^B_f$ and refined to sub-pixel precision matches $\mathcal{M}_f$.
  • Figure 3: Visualization of pruning masks and the matching results on MegaDepth dataset. The Patch pruning can identify redundant image patches and exclude them from subsequent feature interactions gradually (from shallow to deep layers $\blacksquare$$\rightarrow$$\blacksquare$$\rightarrow$$\blacksquare$$\rightarrow$$\blacksquare$). It avoids most incorrect matches.
  • Figure 4: Qualitative Results. We compare PRISM with SP SuperPoint+LG LightGlue and LoFTR LoFTR in ScanNet ScanNet and MegaDepth MegaDepth dataset. As shown in the figure, PRISM can generate more dense matches and avoid most outliers in both indoor and outdoor scenes. The red color indicates epipolar error beyond $5 \times 10^4$ (in the normalized image coordinates). More visualizations are provided in the Appendix.
  • Figure 5: The process of attention mechanism. The $\otimes$ means the product and the $\oplus$ represents the summation. The RoPE considers the relative distance between query and key when computing the attention score $a_{ij}$.
  • ...and 3 more figures