Table of Contents
Fetching ...

U(PM)$^2$:Unsupervised polygon matching with pre-trained models for challenging stereo images

Chang Li, Xingtao Peng

TL;DR

U(PM)$^2$ presents a training-free polygon matching framework for stereo imagery that fuses pre-trained detectors with handcrafted geometric constraints. The pipeline comprises a SAM-based detector, a global matcher with a bidirectional-pyramid search and a deep-spectral factor, and a local matcher (LoJoGM) that uses the Hungarian algorithm to enforce robust polygon correspondences. It also introduces a discrete polygon-matching metric for GT-grounded evaluation. The approach achieves state-of-the-art accuracy with competitive speed on SceneFlow and ScanNet while generalizing well to challenging imagery, demonstrating a practical, training-free solution for high-level scene understanding in applications like urban reconstruction and AR.

Abstract

Stereo image matching is a fundamental task in computer vision, photogrammetry and remote sensing, but there is an almost unexplored field, i.e., polygon matching, which faces the following challenges: disparity discontinuity, scale variation, training requirement, and generalization. To address the above-mentioned issues, this paper proposes a novel U(PM)$^2$: low-cost unsupervised polygon matching with pre-trained models by uniting automatically learned and handcrafted features, of which pipeline is as follows: firstly, the detector leverages the pre-trained segment anything model to obtain masks; then, the vectorizer converts the masks to polygons and graphic structure; secondly, the global matcher addresses challenges from global viewpoint changes and scale variation based on bidirectional-pyramid strategy with pre-trained LoFTR; finally, the local matcher further overcomes local disparity discontinuity and topology inconsistency of polygon matching by local-joint geometry and multi-feature matching strategy with Hungarian algorithm. We benchmark our U(PM)$^2$ on the ScanNet and SceneFlow datasets using our proposed new metric, which achieved state-of-the-art accuracy at a competitive speed and satisfactory generalization performance at low cost without any training requirement.

U(PM)$^2$:Unsupervised polygon matching with pre-trained models for challenging stereo images

TL;DR

U(PM) presents a training-free polygon matching framework for stereo imagery that fuses pre-trained detectors with handcrafted geometric constraints. The pipeline comprises a SAM-based detector, a global matcher with a bidirectional-pyramid search and a deep-spectral factor, and a local matcher (LoJoGM) that uses the Hungarian algorithm to enforce robust polygon correspondences. It also introduces a discrete polygon-matching metric for GT-grounded evaluation. The approach achieves state-of-the-art accuracy with competitive speed on SceneFlow and ScanNet while generalizing well to challenging imagery, demonstrating a practical, training-free solution for high-level scene understanding in applications like urban reconstruction and AR.

Abstract

Stereo image matching is a fundamental task in computer vision, photogrammetry and remote sensing, but there is an almost unexplored field, i.e., polygon matching, which faces the following challenges: disparity discontinuity, scale variation, training requirement, and generalization. To address the above-mentioned issues, this paper proposes a novel U(PM): low-cost unsupervised polygon matching with pre-trained models by uniting automatically learned and handcrafted features, of which pipeline is as follows: firstly, the detector leverages the pre-trained segment anything model to obtain masks; then, the vectorizer converts the masks to polygons and graphic structure; secondly, the global matcher addresses challenges from global viewpoint changes and scale variation based on bidirectional-pyramid strategy with pre-trained LoFTR; finally, the local matcher further overcomes local disparity discontinuity and topology inconsistency of polygon matching by local-joint geometry and multi-feature matching strategy with Hungarian algorithm. We benchmark our U(PM) on the ScanNet and SceneFlow datasets using our proposed new metric, which achieved state-of-the-art accuracy at a competitive speed and satisfactory generalization performance at low cost without any training requirement.

Paper Structure

This paper contains 19 sections, 12 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: A visual representation of the different types of matching primitives (points, lines, polygons).
  • Figure 2: U(PM)$^2$ vs. MASA and MESA in visual comparison. The locations indicated by the rounded boxes are where the topological relationships have changed. MASA fails to detect polygons or areas because it was designed for different tasks. MESA achieves lower area overlap ratios and higher mismatch rates. In contrast, U(PM)$^2$ attains the highest number of matches with the finest-grained accuracy (down to individual building-level accuracy).
  • Figure 3: Overview of U(PM)$^2$. 1) Detector jointly performs polygon and feature point detection from stereo images to construct polygons with graphic structure. 2) Global matcher matches feature points extracted by the Detector, establishing reliable correspondences and global geometric constraints. 3)Local matcher eliminates ambiguous matches through the geometry and multi-feature matching strategy to solve bipartite graph matching, ultimately optimizing polygon matching.
  • Figure 4: Bidirectional-pyramid matching. The Bidirectional pyramid progressively narrows the search region from the top-level initial area toward lower levels to establish geometric constraints for subsequent local matching.
  • Figure 5: The proposed Local Matcher. It addresses matching failures caused by local deformations and topological inconsistencies. The essential reason is the viewpoint variation causes disparity variation, then disparity variation in turn leads to topological inconsistency (e.g., the relative position relationship between the box and the polygonal area). To solve above-mentioned issue, the local matching is proposed by joint local geometric and texture correlations.