Searching from Area to Point: A Hierarchical Framework for Semantic-Geometric Combined Feature Matching
Yesheng Zhang, Xu Zhao
TL;DR
The paper addresses the bottleneck of undefined search spaces in feature matching by introducing semantic area matches as a robust, coarse intermediate space. It proposes the A2PM framework to first establish semantic area matches and then perform point matching within those areas, implemented via SGAM, which combines Semantic Area Matching (SAM) and Geometry Area Matching (GAM) to enforce geometric consistency. Across indoor and outdoor datasets, SGAM improves sparse, semi-dense, and dense matchers and enhances pose estimation, while Global Match Collection helps in low-semantic scenes; however, performance remains dependent on semantic segmentation quality. The approach offers a practical, plug-in enhancement to existing matchers, reducing redundant computation and improving robustness for downstream tasks like SLAM and SfM, with clear pathways for parallelization and future improvements in semantic reliability and generalization.
Abstract
Feature matching is a crucial technique in computer vision. A unified perspective for this task is to treat it as a searching problem, aiming at an efficient search strategy to narrow the search space to point matches between images. One of the key aspects of search strategy is the search space, which in current approaches is not carefully defined, resulting in limited matching accuracy. This paper, thus, pays attention to the search space and proposes to set the initial search space for point matching as the matched image areas containing prominent semantic, named semantic area matches. This search space favors point matching by salient features and alleviates the accuracy limitation in recent Transformer-based matching methods. To achieve this search space, we introduce a hierarchical feature matching framework: Area to Point Matching (A2PM), to first find semantic area matches between images and later perform point matching on area matches. We further propose Semantic and Geometry Area Matching (SGAM) method to realize this framework, which utilizes semantic prior and geometry consistency to establish accurate area matches between images. By integrating SGAM with off-the-shelf state-of-the-art matchers, our method, adopting the A2PM framework, achieves encouraging precision improvements in massive point matching and pose estimation experiments.
