Table of Contents
Fetching ...

PATS: Patch Area Transportation with Subdivision for Local Feature Matching

Junjie Ni, Yijin Li, Zhaoyang Huang, Hongsheng Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang

TL;DR

The paper tackles local feature matching under large inter-image scale differences, a weakness of many detector-free methods. It introduces Patch Area Transportation with Subdivision (PATS), which partitions images into patches, learns scale differences self-supervised via an area-transportation formulation, and supports many-to-many patch correspondences through a differentiable Sinkhorn-based optimizer. A scale-adaptive subdivision scheme refines matches from coarse to fine by aligning content scales and re-sampling patches, yielding accurate, robust correspondences suitable for relative pose estimation, visual localization, and optical flow. Empirical results across multiple datasets show state-of-the-art performance and robustness to extreme scale changes, with ablations validating the contributions of area regression, patch transportation, and hierarchical subdivision. While the method does not operate in real time, its framework provides a strong foundation for scalable, detector-free matching and potential real-time extensions for SLAM.

Abstract

Local feature matching aims at establishing sparse correspondences between a pair of images. Recently, detector-free methods present generally better performance but are not satisfactory in image pairs with large scale differences. In this paper, we propose Patch Area Transportation with Subdivision (PATS) to tackle this issue. Instead of building an expensive image pyramid, we start by splitting the original image pair into equal-sized patches and gradually resizing and subdividing them into smaller patches with the same scale. However, estimating scale differences between these patches is non-trivial since the scale differences are determined by both relative camera poses and scene structures, and thus spatially varying over image pairs. Moreover, it is hard to obtain the ground truth for real scenes. To this end, we propose patch area transportation, which enables learning scale differences in a self-supervised manner. In contrast to bipartite graph matching, which only handles one-to-one matching, our patch area transportation can deal with many-to-many relationships. PATS improves both matching accuracy and coverage, and shows superior performance in downstream tasks, such as relative pose estimation, visual localization, and optical flow estimation. The source code is available at \url{https://zju3dv.github.io/pats/}.

PATS: Patch Area Transportation with Subdivision for Local Feature Matching

TL;DR

The paper tackles local feature matching under large inter-image scale differences, a weakness of many detector-free methods. It introduces Patch Area Transportation with Subdivision (PATS), which partitions images into patches, learns scale differences self-supervised via an area-transportation formulation, and supports many-to-many patch correspondences through a differentiable Sinkhorn-based optimizer. A scale-adaptive subdivision scheme refines matches from coarse to fine by aligning content scales and re-sampling patches, yielding accurate, robust correspondences suitable for relative pose estimation, visual localization, and optical flow. Empirical results across multiple datasets show state-of-the-art performance and robustness to extreme scale changes, with ablations validating the contributions of area regression, patch transportation, and hierarchical subdivision. While the method does not operate in real time, its framework provides a strong foundation for scalable, detector-free matching and potential real-time extensions for SLAM.

Abstract

Local feature matching aims at establishing sparse correspondences between a pair of images. Recently, detector-free methods present generally better performance but are not satisfactory in image pairs with large scale differences. In this paper, we propose Patch Area Transportation with Subdivision (PATS) to tackle this issue. Instead of building an expensive image pyramid, we start by splitting the original image pair into equal-sized patches and gradually resizing and subdividing them into smaller patches with the same scale. However, estimating scale differences between these patches is non-trivial since the scale differences are determined by both relative camera poses and scene structures, and thus spatially varying over image pairs. Moreover, it is hard to obtain the ground truth for real scenes. To this end, we propose patch area transportation, which enables learning scale differences in a self-supervised manner. In contrast to bipartite graph matching, which only handles one-to-one matching, our patch area transportation can deal with many-to-many relationships. PATS improves both matching accuracy and coverage, and shows superior performance in downstream tasks, such as relative pose estimation, visual localization, and optical flow estimation. The source code is available at \url{https://zju3dv.github.io/pats/}.
Paper Structure (14 sections, 10 equations, 5 figures, 7 tables)

This paper contains 14 sections, 10 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Two-view reconstruction results of LoFTR loftr, ASpanFormer aspanformer, PDC-Net+ pdcnet++ and our approach on MegaDepth dataset megadepth. PATS can extract high-quality matches under severe scale variations and in indistinctive regions with repetitive patterns, which allows semi-dense two-view reconstruction by simply triangulating the matches in a image pair. In contrast, other methods either obtain fewer matches or even obtain erroneous results.
  • Figure 2: Scale Alignment with Patch Area Transportation. Our approach learns to find the many-to-many relationship and scale differences through solving the patch area transportation. Then we crop the patches and resize the image content to align the scale, which remove the appearance distortion.
  • Figure 3: Overview of PATS. We a) extract features for patches. Then, we b) formulate the patch area transportation by setting source patches' area $\mathbf a_S$ as $\mathbf 1_N$, regressing target patches' area $\mathbf a_T$, and bound the transportation via visual similarities $\mathbf C$. The feature descriptors $\mathbf f$ that produce $\mathbf C$ and the area regression $\mathbf a_T$ are learned by solving this problem differentially. The solution of this problem $\mathbf P$ also reveals many-to-many patch relationships. Based on $\mathbf P$, we c) find corresponding regions, represented by target patches inside a bounding box $B_i$, for each source patch. The exact patch corresponding position $\hat{\mathbf p}_i$ is the position expectation over $B_i$. After cropping and resizing image contents according to the obtained window sizes, which align the contents to the same scale, we d) subdivide the cropped contents to smaller patches and enter the next iteration.
  • Figure 4: Sub-patches Trimming. a) The windows of neighboring source patches are partially overlapped due to the expansion. b) After subdivision, the sub-patches at the overlapped locations are redundant. c) We reserve the sub-patches that send the largest effective area $\sum_{j\in B_k} P_{k,j}$ as patches for the next level.
  • Figure 5: Qualitative Comparison of Feature Matching. The matched features are visualized as the same color. We have filtered incorrect matches. PATS shows superior performance on both accuracy and coverage.