U(PM)$^2$:Unsupervised polygon matching with pre-trained models for challenging stereo images
Chang Li, Xingtao Peng
TL;DR
U(PM)$^2$ presents a training-free polygon matching framework for stereo imagery that fuses pre-trained detectors with handcrafted geometric constraints. The pipeline comprises a SAM-based detector, a global matcher with a bidirectional-pyramid search and a deep-spectral factor, and a local matcher (LoJoGM) that uses the Hungarian algorithm to enforce robust polygon correspondences. It also introduces a discrete polygon-matching metric for GT-grounded evaluation. The approach achieves state-of-the-art accuracy with competitive speed on SceneFlow and ScanNet while generalizing well to challenging imagery, demonstrating a practical, training-free solution for high-level scene understanding in applications like urban reconstruction and AR.
Abstract
Stereo image matching is a fundamental task in computer vision, photogrammetry and remote sensing, but there is an almost unexplored field, i.e., polygon matching, which faces the following challenges: disparity discontinuity, scale variation, training requirement, and generalization. To address the above-mentioned issues, this paper proposes a novel U(PM)$^2$: low-cost unsupervised polygon matching with pre-trained models by uniting automatically learned and handcrafted features, of which pipeline is as follows: firstly, the detector leverages the pre-trained segment anything model to obtain masks; then, the vectorizer converts the masks to polygons and graphic structure; secondly, the global matcher addresses challenges from global viewpoint changes and scale variation based on bidirectional-pyramid strategy with pre-trained LoFTR; finally, the local matcher further overcomes local disparity discontinuity and topology inconsistency of polygon matching by local-joint geometry and multi-feature matching strategy with Hungarian algorithm. We benchmark our U(PM)$^2$ on the ScanNet and SceneFlow datasets using our proposed new metric, which achieved state-of-the-art accuracy at a competitive speed and satisfactory generalization performance at low cost without any training requirement.
