SegMASt3R: Geometry Grounded Segment Matching
Rohit Jayanti, Swayam Agrawal, Vansh Garg, Siddharth Tourani, Muhammad Haris Khan, Sourav Garg, Madhava Krishna
TL;DR
SegMASt3R tackles wide-baseline segment matching by re-purposing a 3D foundation model (MASt3R) with a segment-feature head and a differentiable Sinkhorn-based matcher to produce robust segment correspondences across image pairs with extreme viewpoint changes ($180^{\circ}$). The approach leverages geometry-aware priors from 3D pretraining, enabling strong segment-level representations and matching performance that surpasses SAM2’s propagator and several local feature methods on indoor and outdoor benchmarks. It introduces a differentiable segment matching layer with a learnable dustbin and an end-to-end training objective, and demonstrates practical impact on downstream tasks such as 3D instance mapping and object-relative navigation. Overall, SegMASt3R establishes segment matching as a geometry-guided, transferable capability that improves robustness to occlusion, appearance changes, and perceptual aliasing in complex scenes.
Abstract
Segment matching is an important intermediate task in computer vision that establishes correspondences between semantically or geometrically coherent regions across images. Unlike keypoint matching, which focuses on localized features, segment matching captures structured regions, offering greater robustness to occlusions, lighting variations, and viewpoint changes. In this paper, we leverage the spatial understanding of 3D foundation models to tackle wide-baseline segment matching, a challenging setting involving extreme viewpoint shifts. We propose an architecture that uses the inductive bias of these 3D foundation models to match segments across image pairs with up to 180 degree view-point change rotation. Extensive experiments show that our approach outperforms state-of-the-art methods, including the SAM2 video propagator and local feature matching methods, by up to 30% on the AUPRC metric, on ScanNet++ and Replica datasets. We further demonstrate benefits of the proposed model on relevant downstream tasks, including 3D instance mapping and object-relative navigation. Project Page: https://segmast3r.github.io/
