MESA: Effective Matching Redundancy Reduction by Semantic Area Segmentation
Yesheng Zhang, Shuhan Shen, Xu Zhao
TL;DR
This work tackles matching redundancy in feature matching by introducing explicit semantic area matching through Segment Anything Model (SAM). The authors propose MESA, a sparse-area matching pipeline using Area Graphs and two graphical models, and DMESA, its dense, faster counterpart that leverages dense patch matching and a Gaussian Mixture Model with finite-step EM for cycle-consistent area matching. Across five diverse datasets and six point matcher baselines, both methods consistently improve PM performance, with DMESA offering a strong accuracy-efficiency trade-off. The approach enhances robustness to resolution changes and demonstrates good generalization, making AM-driven area matching a versatile upgrade to existing PM pipelines. The work provides practical guidelines for resolution configuration within the A2PM framework and includes thorough ablations, runtime analyses, and cross-domain evaluations.
Abstract
We propose MESA and DMESA as novel feature matching methods, which utilize Segment Anything Model (SAM) to effectively mitigate matching redundancy. The key insight of our methods is to establish implicit-semantic area matching prior to point matching, based on advanced image understanding of SAM. Then, informative area matches with consistent internal semantic are able to undergo dense feature comparison, facilitating precise inside-area point matching. Specifically, MESA adopts a sparse matching framework and first obtains candidate areas from SAM results through a novel Area Graph (AG). Then, area matching among the candidates is formulated as graph energy minimization and solved by graphical models derived from AG. To address the efficiency issue of MESA, we further propose DMESA as its dense counterpart, applying a dense matching framework. After candidate areas are identified by AG, DMESA establishes area matches through generating dense matching distributions. The distributions are produced from off-the-shelf patch matching utilizing the Gaussian Mixture Model and refined via the Expectation Maximization. With less repetitive computation, DMESA showcases a speed improvement of nearly five times compared to MESA, while maintaining competitive accuracy. Our methods are extensively evaluated on five datasets encompassing indoor and outdoor scenes. The results illustrate consistent performance improvements from our methods for five distinct point matching baselines across all datasets. Furthermore, our methods exhibit promise generalization and improved robustness against image resolution variations. The code is publicly available at https://github.com/Easonyesheng/A2PM-MESA.
