Table of Contents
Fetching ...

Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching

Xiaoyong Lu, Songlin Du

TL;DR

RCM addresses three core bottlenecks in local feature matching—scarcity of matchable points in small-scale scenes, matching conflicts under large scale variation, and reliance on keypoint repeatability—by coupling a dynamic view switching mechanism with a conflict-free, many-to-one coarse matcher in a semi-sparse coarse-to-fine architecture. The view switcher increases usable matches in the source image, while the dustbin-enabled many-to-one coarse matching resolves target-image conflicts, together substantially raising the practical matching ceiling. Extensive experiments across HPatches, MegaDepth, ScanNet, and Aachen Day-Night demonstrate strong accuracy and competitive efficiency, with notable gains in ground-truth matches (up to $260\%$) and faster performance for the semi-sparse variant. The approach offers robust generalization and is well-suited for real-time and large-scale vision tasks, including localization and pose estimation, without task-specific fine-tuning.

Abstract

Current feature matching methods prioritize improving modeling capabilities to better align outputs with ground-truth matches, which are the theoretical upper bound on matching results, metaphorically depicted as the "ceiling". However, these enhancements fail to address the underlying issues that directly hinder ground-truth matches, including the scarcity of matchable points in small scale images, matching conflicts in dense methods, and the keypoint-repeatability reliance in sparse methods. We propose a novel feature matching method named RCM, which Raises the Ceiling of Matching from three aspects. 1) RCM introduces a dynamic view switching mechanism to address the scarcity of matchable points in source images by strategically switching image pairs. 2) RCM proposes a conflict-free coarse matching module, addressing matching conflicts in the target image through a many-to-one matching strategy. 3) By integrating the semi-sparse paradigm and the coarse-to-fine architecture, RCM preserves the benefits of both high efficiency and global search, mitigating the reliance on keypoint repeatability. As a result, RCM enables more matchable points in the source image to be matched in an exhaustive and conflict-free manner in the target image, leading to a substantial 260% increase in ground-truth matches. Comprehensive experiments show that RCM exhibits remarkable performance and efficiency in comparison to state-of-the-art methods.

Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching

TL;DR

RCM addresses three core bottlenecks in local feature matching—scarcity of matchable points in small-scale scenes, matching conflicts under large scale variation, and reliance on keypoint repeatability—by coupling a dynamic view switching mechanism with a conflict-free, many-to-one coarse matcher in a semi-sparse coarse-to-fine architecture. The view switcher increases usable matches in the source image, while the dustbin-enabled many-to-one coarse matching resolves target-image conflicts, together substantially raising the practical matching ceiling. Extensive experiments across HPatches, MegaDepth, ScanNet, and Aachen Day-Night demonstrate strong accuracy and competitive efficiency, with notable gains in ground-truth matches (up to ) and faster performance for the semi-sparse variant. The approach offers robust generalization and is well-suited for real-time and large-scale vision tasks, including localization and pose estimation, without task-specific fine-tuning.

Abstract

Current feature matching methods prioritize improving modeling capabilities to better align outputs with ground-truth matches, which are the theoretical upper bound on matching results, metaphorically depicted as the "ceiling". However, these enhancements fail to address the underlying issues that directly hinder ground-truth matches, including the scarcity of matchable points in small scale images, matching conflicts in dense methods, and the keypoint-repeatability reliance in sparse methods. We propose a novel feature matching method named RCM, which Raises the Ceiling of Matching from three aspects. 1) RCM introduces a dynamic view switching mechanism to address the scarcity of matchable points in source images by strategically switching image pairs. 2) RCM proposes a conflict-free coarse matching module, addressing matching conflicts in the target image through a many-to-one matching strategy. 3) By integrating the semi-sparse paradigm and the coarse-to-fine architecture, RCM preserves the benefits of both high efficiency and global search, mitigating the reliance on keypoint repeatability. As a result, RCM enables more matchable points in the source image to be matched in an exhaustive and conflict-free manner in the target image, leading to a substantial 260% increase in ground-truth matches. Comprehensive experiments show that RCM exhibits remarkable performance and efficiency in comparison to state-of-the-art methods.
Paper Structure (26 sections, 9 equations, 18 figures, 7 tables)

This paper contains 26 sections, 9 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Comparison among RCM, RCM$_{\mathrm{\mathbf{Lite}}}$, SuperGlue superglue, LoFTR loftr and MatchFormer matchformer. Full RCM significantly accelerates LoFTR by $2.2\times$ while achieving superior performance. The lightweight RCM$_{\mathrm{Lite}}$ outperforms SuperGlue by $+7.5\%$ while maintaining a speed advantage. The same color indicates the matched features in the visualization.
  • Figure 2: Comparison of three matching methods.(a) The sparse methods rely on precise keypoint detection at corresponding points in both images. (b) We highlight the issues of matching conflicts introduced in \ref{['sec:intro']}. (c) RCM establishes conflict-free coarse matches and adjusts them independently in the fine matching stage.
  • Figure 3: RCM is comprised of four primary modules: 1. The U-Net Extraction Module extracts multi-scale coarse and fine features. 2. The View Switching Module dynamically switches the larger scale image to the sparse branch, where dense coarse features are transformed into sparse features through the detection head. 3. The Conflict-Free Coarse Matching Module involves attention layers processing two sets of coarse features, which are subsequently matched via the many-to-one matching layer. 4. The Fine Matching Module further refines coarse matches based on fine features.
  • Figure 4: The view switcher perceives the scale variation between image pairs to predict a binary classification result, indicating whether to switch the images.
  • Figure 5: Qualitative comparison of SuperGlue superglue, LoFTR loftr and proposed RCM on MegaDepth and ScanNet.
  • ...and 13 more figures