Table of Contents
Fetching ...

Unlocking Zero-shot Potential of Semi-dense Image Matching via Gaussian Splatting

Juncheng Chen, Chao Xu, Yanjun Cao

TL;DR

We address the need for high-fidelity, diverse training data for pixel-level image matching by leveraging geometry-refined 3D Gaussian Splatting (3DGS) combined with a 2D-3D representation alignment. MatchGS comprises a geometrically faithful data generation pipeline and a coarse-to-fine alignment strategy that injects explicit 3D Gaussian attributes into semi-dense matchers, enabling zero-shot generalization. The approach yields ground-truth correspondences with substantially reduced epipolar error and delivers significant zero-shot gains on public benchmarks (up to 17.7% on ScanNet and 16.2% on ZEB). This work demonstrates that scalable, high-fidelity 3D-aware data can drive robust image matching across unseen scenes and viewpoints, paving the way for a new generation of zero-shot matchers.

Abstract

Learning-based image matching critically depends on large-scale, diverse, and geometrically accurate training data. 3D Gaussian Splatting (3DGS) enables photorealistic novel-view synthesis and thus is attractive for data generation. However, its geometric inaccuracies and biased depth rendering currently prevent robust correspondence labeling. To address this, we introduce MatchGS, the first framework designed to systematically correct and leverage 3DGS for robust, zero-shot image matching. Our approach is twofold: (1) a geometrically-faithful data generation pipeline that refines 3DGS geometry to produce highly precise correspondence labels, enabling the synthesis of a vast and diverse range of viewpoints without compromising rendering fidelity; and (2) a 2D-3D representation alignment strategy that infuses 3DGS' explicit 3D knowledge into the 2D matcher, guiding 2D semi-dense matchers to learn viewpoint-invariant 3D representations. Our generated ground-truth correspondences reduce the epipolar error by up to 40 times compared to existing datasets, enable supervision under extreme viewpoint changes, and provide self-supervisory signals through Gaussian attributes. Consequently, state-of-the-art matchers trained solely on our data achieve significant zero-shot performance gains on public benchmarks, with improvements of up to 17.7%. Our work demonstrates that with proper geometric refinement, 3DGS can serve as a scalable, high-fidelity, and structurally-rich data source, paving the way for a new generation of robust zero-shot image matchers.

Unlocking Zero-shot Potential of Semi-dense Image Matching via Gaussian Splatting

TL;DR

We address the need for high-fidelity, diverse training data for pixel-level image matching by leveraging geometry-refined 3D Gaussian Splatting (3DGS) combined with a 2D-3D representation alignment. MatchGS comprises a geometrically faithful data generation pipeline and a coarse-to-fine alignment strategy that injects explicit 3D Gaussian attributes into semi-dense matchers, enabling zero-shot generalization. The approach yields ground-truth correspondences with substantially reduced epipolar error and delivers significant zero-shot gains on public benchmarks (up to 17.7% on ScanNet and 16.2% on ZEB). This work demonstrates that scalable, high-fidelity 3D-aware data can drive robust image matching across unseen scenes and viewpoints, paving the way for a new generation of zero-shot matchers.

Abstract

Learning-based image matching critically depends on large-scale, diverse, and geometrically accurate training data. 3D Gaussian Splatting (3DGS) enables photorealistic novel-view synthesis and thus is attractive for data generation. However, its geometric inaccuracies and biased depth rendering currently prevent robust correspondence labeling. To address this, we introduce MatchGS, the first framework designed to systematically correct and leverage 3DGS for robust, zero-shot image matching. Our approach is twofold: (1) a geometrically-faithful data generation pipeline that refines 3DGS geometry to produce highly precise correspondence labels, enabling the synthesis of a vast and diverse range of viewpoints without compromising rendering fidelity; and (2) a 2D-3D representation alignment strategy that infuses 3DGS' explicit 3D knowledge into the 2D matcher, guiding 2D semi-dense matchers to learn viewpoint-invariant 3D representations. Our generated ground-truth correspondences reduce the epipolar error by up to 40 times compared to existing datasets, enable supervision under extreme viewpoint changes, and provide self-supervisory signals through Gaussian attributes. Consequently, state-of-the-art matchers trained solely on our data achieve significant zero-shot performance gains on public benchmarks, with improvements of up to 17.7%. Our work demonstrates that with proper geometric refinement, 3DGS can serve as a scalable, high-fidelity, and structurally-rich data source, paving the way for a new generation of robust zero-shot image matchers.

Paper Structure

This paper contains 20 sections, 16 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: (a) illustrates our data generation pipeline. Given train-view images and monocular depth priors, we first reconstruct the scene using our geometry-improved 3DGS. Augmented viewpoints are then generated from train views, with pre-rendering checks removing outliers before rendering usable data. (b-1) to (b-4) compares four depth rendering methods detailed in Sec. \ref{['sec:data_generation']}.
  • Figure 2: Visualization of data generation quality. Our proposed pipeline can freely generate dense and accurate labels under large variations in viewpoint and scale.
  • Figure 3: Coarse-level representation alignment. Given a coarse-to-fine matcher, local crops at 2D positions indicated by ground-truth coarse matches are encoded as patch embeddings. Simultaneously, 3D positions of the matches are used to query multi-scale voxel features from Point Transformer, which are encoded as voxel embeddings. Two embeddings are aligned via contrastive loss. The trained patch embedding head is then frozen and used to assist correlation computation.
  • Figure 4: Qualitative Results. We compare with current state-of-the-art semi-dense matchers. Our method shows superior robustness under large viewpoint changes in both indoor and outdoor scenes.
  • Figure 5: Successful and failed cases on MegaDepth dataset. Using MatchGS$_\text{ELoFTR}$ for zero-shot testing.
  • ...and 1 more figures