Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS
Wei Sun, Xiaosong Zhang, Fang Wan, Yanzhao Zhou, Yuan Li, Qixiang Ye, Jianbin Jiao
TL;DR
This work tackles SfM-free novel view synthesis, where pose inaccuracies hinder optimization under per-pixel losses. It introduces CG-3DGS, a two-stage approach that uses 2D correspondences to guide pose optimization on a differentiable 3D Gaussian splatting representation, followed by full scene optimization with learned poses. Key contributions include (i) a correspondence-based loss and differentiable approximated surface rendering to backpropagate through pose updates, (ii) a cache-enabled, efficient pose estimation pipeline using SE-3 transformations on 3D Gaussians, and (iii) competitive NVS results and pose accuracy on challenging datasets like Tanks & Temples and CO3D-V2, with improved efficiency over SfM-dependent baselines. The approach demonstrates robust performance under complex camera motions and provides a practical, SfM-free pathway for high-quality NVS.
Abstract
Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. However, most existing works rely on per-pixel image loss functions, such as L2 loss. In SfM-free methods, inaccurate initial poses lead to misalignment issue, which, under the constraints of per-pixel image loss functions, results in excessive gradients, causing unstable optimization and poor convergence for NVS. In this study, we propose a correspondence-guided SfM-free 3D Gaussian splatting for NVS. We use correspondences between the target and the rendered result to achieve better pixel alignment, facilitating the optimization of relative poses between frames. We then apply the learned poses to optimize the entire scene. Each 2D screen-space pixel is associated with its corresponding 3D Gaussians through approximated surface rendering to facilitate gradient back propagation. Experimental results underline the superior performance and time efficiency of the proposed approach compared to the state-of-the-art baselines.
