Table of Contents
Fetching ...

Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS

Wei Sun, Xiaosong Zhang, Fang Wan, Yanzhao Zhou, Yuan Li, Qixiang Ye, Jianbin Jiao

TL;DR

This work tackles SfM-free novel view synthesis, where pose inaccuracies hinder optimization under per-pixel losses. It introduces CG-3DGS, a two-stage approach that uses 2D correspondences to guide pose optimization on a differentiable 3D Gaussian splatting representation, followed by full scene optimization with learned poses. Key contributions include (i) a correspondence-based loss and differentiable approximated surface rendering to backpropagate through pose updates, (ii) a cache-enabled, efficient pose estimation pipeline using SE-3 transformations on 3D Gaussians, and (iii) competitive NVS results and pose accuracy on challenging datasets like Tanks & Temples and CO3D-V2, with improved efficiency over SfM-dependent baselines. The approach demonstrates robust performance under complex camera motions and provides a practical, SfM-free pathway for high-quality NVS.

Abstract

Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. However, most existing works rely on per-pixel image loss functions, such as L2 loss. In SfM-free methods, inaccurate initial poses lead to misalignment issue, which, under the constraints of per-pixel image loss functions, results in excessive gradients, causing unstable optimization and poor convergence for NVS. In this study, we propose a correspondence-guided SfM-free 3D Gaussian splatting for NVS. We use correspondences between the target and the rendered result to achieve better pixel alignment, facilitating the optimization of relative poses between frames. We then apply the learned poses to optimize the entire scene. Each 2D screen-space pixel is associated with its corresponding 3D Gaussians through approximated surface rendering to facilitate gradient back propagation. Experimental results underline the superior performance and time efficiency of the proposed approach compared to the state-of-the-art baselines.

Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS

TL;DR

This work tackles SfM-free novel view synthesis, where pose inaccuracies hinder optimization under per-pixel losses. It introduces CG-3DGS, a two-stage approach that uses 2D correspondences to guide pose optimization on a differentiable 3D Gaussian splatting representation, followed by full scene optimization with learned poses. Key contributions include (i) a correspondence-based loss and differentiable approximated surface rendering to backpropagate through pose updates, (ii) a cache-enabled, efficient pose estimation pipeline using SE-3 transformations on 3D Gaussians, and (iii) competitive NVS results and pose accuracy on challenging datasets like Tanks & Temples and CO3D-V2, with improved efficiency over SfM-dependent baselines. The approach demonstrates robust performance under complex camera motions and provides a practical, SfM-free pathway for high-quality NVS.

Abstract

Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. However, most existing works rely on per-pixel image loss functions, such as L2 loss. In SfM-free methods, inaccurate initial poses lead to misalignment issue, which, under the constraints of per-pixel image loss functions, results in excessive gradients, causing unstable optimization and poor convergence for NVS. In this study, we propose a correspondence-guided SfM-free 3D Gaussian splatting for NVS. We use correspondences between the target and the rendered result to achieve better pixel alignment, facilitating the optimization of relative poses between frames. We then apply the learned poses to optimize the entire scene. Each 2D screen-space pixel is associated with its corresponding 3D Gaussians through approximated surface rendering to facilitate gradient back propagation. Experimental results underline the superior performance and time efficiency of the proposed approach compared to the state-of-the-art baselines.
Paper Structure (23 sections, 13 equations, 3 figures, 6 tables)

This paper contains 23 sections, 13 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of our CG-3DGS. (a) We utilize camera intrinsics and the identity pose to back-project depth estimate into a point cloud, initializing a set of 3D Gaussians. (b) These 3D Gaussians are used to simulate camera pose changes between adjacent frames through SE-3 transformations. First, we update the parameters of $G_t$ based on the rendering results on frame $t$, and use the SE-3 transformed $G_t$ as $G_{t+1}$ to render frame $t+1$. At this point, we freeze $G_t$ and only update the parameters of the SE-3 transformation. This iterative process continues until the relative poses between all adjacent frames in a video sequence are estimated. The optimization is based on the correspondence between the rendered result and the ground truth. (c) After pose estimation, the same point cloud is also used to initialize a set of 3D Gaussians used for rendering the scene, and frames with estimated poses are randomly sampled for conventional training of the 3D Gaussians.
  • Figure 2: (a) By performing alpha-blending on the center coordinates of the 3D Gaussians, an approximate 3D surface point is generated and projected onto the 2D screen. (b) The comparison between traditional methods and our method. The fundamental difference is the technique used for aligning pixels.
  • Figure 3: Qualitative comparison for novel view synthesis on Tanks and Temples. Our approach produces more realistic rendering results than other baselines. Better viewed when zoomed in.