PATS: Patch Area Transportation with Subdivision for Local Feature Matching
Junjie Ni, Yijin Li, Zhaoyang Huang, Hongsheng Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang
TL;DR
The paper tackles local feature matching under large inter-image scale differences, a weakness of many detector-free methods. It introduces Patch Area Transportation with Subdivision (PATS), which partitions images into patches, learns scale differences self-supervised via an area-transportation formulation, and supports many-to-many patch correspondences through a differentiable Sinkhorn-based optimizer. A scale-adaptive subdivision scheme refines matches from coarse to fine by aligning content scales and re-sampling patches, yielding accurate, robust correspondences suitable for relative pose estimation, visual localization, and optical flow. Empirical results across multiple datasets show state-of-the-art performance and robustness to extreme scale changes, with ablations validating the contributions of area regression, patch transportation, and hierarchical subdivision. While the method does not operate in real time, its framework provides a strong foundation for scalable, detector-free matching and potential real-time extensions for SLAM.
Abstract
Local feature matching aims at establishing sparse correspondences between a pair of images. Recently, detector-free methods present generally better performance but are not satisfactory in image pairs with large scale differences. In this paper, we propose Patch Area Transportation with Subdivision (PATS) to tackle this issue. Instead of building an expensive image pyramid, we start by splitting the original image pair into equal-sized patches and gradually resizing and subdividing them into smaller patches with the same scale. However, estimating scale differences between these patches is non-trivial since the scale differences are determined by both relative camera poses and scene structures, and thus spatially varying over image pairs. Moreover, it is hard to obtain the ground truth for real scenes. To this end, we propose patch area transportation, which enables learning scale differences in a self-supervised manner. In contrast to bipartite graph matching, which only handles one-to-one matching, our patch area transportation can deal with many-to-many relationships. PATS improves both matching accuracy and coverage, and shows superior performance in downstream tasks, such as relative pose estimation, visual localization, and optical flow estimation. The source code is available at \url{https://zju3dv.github.io/pats/}.
