Table of Contents
Fetching ...

OAMatcher: An Overlapping Areas-based Network for Accurate Local Feature Matching

Kun Dai, Tao Xie, Ke Wang, Zhiqiang Jiang, Ruifeng Li, Lijun Zhao

TL;DR

OAMatcher tackles dense local feature matching by mimicking human behavior: first aggregating global context and then focusing on overlapping regions. It introduces Overlapping Areas Prediction Module (OAPM) to identify co-visible keypoints and a Match Labels Weight Strategy (MLWS) to weight losses according to label confidence, reducing noise impact. The method combines depth-wise convolution with Transformer-based encoding (OMAM and MRB) to capture both global and local cues, and employs a coarse-to-fine refinement pipeline for precise matches. Across HPatches, Scannet, and MegaDepth, OAMatcher achieves state-of-the-art or competitive accuracy with robust performance under appearance and viewpoint variations, while maintaining efficiency suitable for practical use.

Abstract

Local feature matching is an essential component in many visual applications. In this work, we propose OAMatcher, a Tranformer-based detector-free method that imitates humans behavior to generate dense and accurate matches. Firstly, OAMatcher predicts overlapping areas to promote effective and clean global context aggregation, with the key insight that humans focus on the overlapping areas instead of the entire images after multiple observations when matching keypoints in image pairs. Technically, we first perform global information integration across all keypoints to imitate the humans behavior of observing the entire images at the beginning of feature matching. Then, we propose Overlapping Areas Prediction Module (OAPM) to capture the keypoints in co-visible regions and conduct feature enhancement among them to simulate that humans transit the focus regions from the entire images to overlapping regions, hence realizeing effective information exchange without the interference coming from the keypoints in non overlapping areas. Besides, since humans tend to leverage probability to determine whether the match labels are correct or not, we propose a Match Labels Weight Strategy (MLWS) to generate the coefficients used to appraise the reliability of the ground-truth match labels, while alleviating the influence of measurement noise coming from the data. Moreover, we integrate depth-wise convolution into Tranformer encoder layers to ensure OAMatcher extracts local and global feature representation concurrently. Comprehensive experiments demonstrate that OAMatcher outperforms the state-of-the-art methods on several benchmarks, while exhibiting excellent robustness to extreme appearance variants. The source code is available at https://github.com/DK-HU/OAMatcher.

OAMatcher: An Overlapping Areas-based Network for Accurate Local Feature Matching

TL;DR

OAMatcher tackles dense local feature matching by mimicking human behavior: first aggregating global context and then focusing on overlapping regions. It introduces Overlapping Areas Prediction Module (OAPM) to identify co-visible keypoints and a Match Labels Weight Strategy (MLWS) to weight losses according to label confidence, reducing noise impact. The method combines depth-wise convolution with Transformer-based encoding (OMAM and MRB) to capture both global and local cues, and employs a coarse-to-fine refinement pipeline for precise matches. Across HPatches, Scannet, and MegaDepth, OAMatcher achieves state-of-the-art or competitive accuracy with robust performance under appearance and viewpoint variations, while maintaining efficiency suitable for practical use.

Abstract

Local feature matching is an essential component in many visual applications. In this work, we propose OAMatcher, a Tranformer-based detector-free method that imitates humans behavior to generate dense and accurate matches. Firstly, OAMatcher predicts overlapping areas to promote effective and clean global context aggregation, with the key insight that humans focus on the overlapping areas instead of the entire images after multiple observations when matching keypoints in image pairs. Technically, we first perform global information integration across all keypoints to imitate the humans behavior of observing the entire images at the beginning of feature matching. Then, we propose Overlapping Areas Prediction Module (OAPM) to capture the keypoints in co-visible regions and conduct feature enhancement among them to simulate that humans transit the focus regions from the entire images to overlapping regions, hence realizeing effective information exchange without the interference coming from the keypoints in non overlapping areas. Besides, since humans tend to leverage probability to determine whether the match labels are correct or not, we propose a Match Labels Weight Strategy (MLWS) to generate the coefficients used to appraise the reliability of the ground-truth match labels, while alleviating the influence of measurement noise coming from the data. Moreover, we integrate depth-wise convolution into Tranformer encoder layers to ensure OAMatcher extracts local and global feature representation concurrently. Comprehensive experiments demonstrate that OAMatcher outperforms the state-of-the-art methods on several benchmarks, while exhibiting excellent robustness to extreme appearance variants. The source code is available at https://github.com/DK-HU/OAMatcher.
Paper Structure (21 sections, 20 equations, 9 figures, 8 tables)

This paper contains 21 sections, 20 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Comparison between LoFTR and OAMatcher. Compared with the LoFTR that only integrates information in the entire images, OAMatcher transits the focus regions from entire images to overlapping regions, which is more human-intuitive.
  • Figure 2: The network architecture of OAMatcher. OAMatcher utilizes Feature Extractor to generate multi-scale features. Then, OAMatcher leverages Overlapping Areas Message Aggregation Module to capture co-visible regions, realizing effective and clean context message passing. Finally, Matches Proposal Block are proposed to predict coarse matches, which are optimized by Matches Refinement Block to generate final fine matches.
  • Figure 3: The illustration of OAPM. OAPM utilizes adaptive threshold, morphological close operation and maximum contours to generate overlapping areas.
  • Figure 4: Visualization of refinement. Compared with the coarse matches, the fine matches are closer to the ground-truth.
  • Figure 5: The illustration of MLWS. The match labels $O-A$ and $O-B$ are appended with different label confidence to alleviate the influence of the confused ground-truth match label $O-B$.
  • ...and 4 more figures