Unlocking Zero-shot Potential of Semi-dense Image Matching via Gaussian Splatting
Juncheng Chen, Chao Xu, Yanjun Cao
TL;DR
We address the need for high-fidelity, diverse training data for pixel-level image matching by leveraging geometry-refined 3D Gaussian Splatting (3DGS) combined with a 2D-3D representation alignment. MatchGS comprises a geometrically faithful data generation pipeline and a coarse-to-fine alignment strategy that injects explicit 3D Gaussian attributes into semi-dense matchers, enabling zero-shot generalization. The approach yields ground-truth correspondences with substantially reduced epipolar error and delivers significant zero-shot gains on public benchmarks (up to 17.7% on ScanNet and 16.2% on ZEB). This work demonstrates that scalable, high-fidelity 3D-aware data can drive robust image matching across unseen scenes and viewpoints, paving the way for a new generation of zero-shot matchers.
Abstract
Learning-based image matching critically depends on large-scale, diverse, and geometrically accurate training data. 3D Gaussian Splatting (3DGS) enables photorealistic novel-view synthesis and thus is attractive for data generation. However, its geometric inaccuracies and biased depth rendering currently prevent robust correspondence labeling. To address this, we introduce MatchGS, the first framework designed to systematically correct and leverage 3DGS for robust, zero-shot image matching. Our approach is twofold: (1) a geometrically-faithful data generation pipeline that refines 3DGS geometry to produce highly precise correspondence labels, enabling the synthesis of a vast and diverse range of viewpoints without compromising rendering fidelity; and (2) a 2D-3D representation alignment strategy that infuses 3DGS' explicit 3D knowledge into the 2D matcher, guiding 2D semi-dense matchers to learn viewpoint-invariant 3D representations. Our generated ground-truth correspondences reduce the epipolar error by up to 40 times compared to existing datasets, enable supervision under extreme viewpoint changes, and provide self-supervisory signals through Gaussian attributes. Consequently, state-of-the-art matchers trained solely on our data achieve significant zero-shot performance gains on public benchmarks, with improvements of up to 17.7%. Our work demonstrates that with proper geometric refinement, 3DGS can serve as a scalable, high-fidelity, and structurally-rich data source, paving the way for a new generation of robust zero-shot image matchers.
