Table of Contents
Fetching ...

Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting

Chong Cheng, Gaochao Song, Yiyang Yao, Qinzheng Zhou, Gangjian Zhang, Hao Wang

TL;DR

This work tackles large-scale 3D scene reconstruction from uncalibrated images by introducing GraphGS, which combines spatial priors for rapid structure estimation with a graph-guided 3D Gaussian Splatting optimization. A camera graph, derived from estimated camera relations, provides topology-aware constraints and an adaptive sampling mechanism to prevent overfitting to sparse viewpoints and accelerate convergence. Key contributions include CNNP for selective pairing, Quadrant Filter to prune noisy matches, an octree-based initialization to reduce points, and a graph-based multi-view consistency loss with betweenness-informed sampling. The method achieves state-of-the-art performance on outdoor benchmarks without ground-truth poses, offering a scalable and practical solution for open-scene reconstruction with potential impact on AR/VR and metaverse applications.

Abstract

This paper investigates an open research challenge of reconstructing high-quality, large 3D open scenes from images. It is observed existing methods have various limitations, such as requiring precise camera poses for input and dense viewpoints for supervision. To perform effective and efficient 3D scene reconstruction, we propose a novel graph-guided 3D scene reconstruction framework, GraphGS. Specifically, given a set of images captured by RGB cameras on a scene, we first design a spatial prior-based scene structure estimation method. This is then used to create a camera graph that includes information about the camera topology. Further, we propose to apply the graph-guided multi-view consistency constraint and adaptive sampling strategy to the 3D Gaussian Splatting optimization process. This greatly alleviates the issue of Gaussian points overfitting to specific sparse viewpoints and expedites the 3D reconstruction process. We demonstrate GraphGS achieves high-fidelity 3D reconstruction from images, which presents state-of-the-art performance through quantitative and qualitative evaluation across multiple datasets. Project Page: https://3dagentworld.github.io/graphgs.

Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting

TL;DR

This work tackles large-scale 3D scene reconstruction from uncalibrated images by introducing GraphGS, which combines spatial priors for rapid structure estimation with a graph-guided 3D Gaussian Splatting optimization. A camera graph, derived from estimated camera relations, provides topology-aware constraints and an adaptive sampling mechanism to prevent overfitting to sparse viewpoints and accelerate convergence. Key contributions include CNNP for selective pairing, Quadrant Filter to prune noisy matches, an octree-based initialization to reduce points, and a graph-based multi-view consistency loss with betweenness-informed sampling. The method achieves state-of-the-art performance on outdoor benchmarks without ground-truth poses, offering a scalable and practical solution for open-scene reconstruction with potential impact on AR/VR and metaverse applications.

Abstract

This paper investigates an open research challenge of reconstructing high-quality, large 3D open scenes from images. It is observed existing methods have various limitations, such as requiring precise camera poses for input and dense viewpoints for supervision. To perform effective and efficient 3D scene reconstruction, we propose a novel graph-guided 3D scene reconstruction framework, GraphGS. Specifically, given a set of images captured by RGB cameras on a scene, we first design a spatial prior-based scene structure estimation method. This is then used to create a camera graph that includes information about the camera topology. Further, we propose to apply the graph-guided multi-view consistency constraint and adaptive sampling strategy to the 3D Gaussian Splatting optimization process. This greatly alleviates the issue of Gaussian points overfitting to specific sparse viewpoints and expedites the 3D reconstruction process. We demonstrate GraphGS achieves high-fidelity 3D reconstruction from images, which presents state-of-the-art performance through quantitative and qualitative evaluation across multiple datasets. Project Page: https://3dagentworld.github.io/graphgs.

Paper Structure

This paper contains 25 sections, 1 theorem, 16 equations, 11 figures, 10 tables.

Key Result

Proposition 1

Given two spacial vectors $d^{(i)}=[v^{(i)}_x,v^{(i)}_y,v^{(i)}_z]$, $d^{(j)}=[v^{(j)}_x,v^{(j)}_y,v^{(j)}_z]$, their relative orientation have $8$ states respectively, corresponding to $8$ quadrants in 3D coordinate system. The quadrant of relative orientation can be directly calculated via one cro where $\mathcal{B}_d^{(i,j)}$ is a three-digit binary number, representing $8$ quadrants. $[d_{\tim

Figures (11)

  • Figure 1: Framework of the GraphGS method for efficient large 3D scene reconstruction. The process begins with spatial prior-based structure estimation, followed by octree-based efficient organization of initialization points. The camera graph is obtained at the end of structure estimation, which contains topology information of scene camera. The information in camera graph will be further used for the following gaussian optimization.
  • Figure 2: Illustration of spatial prior-based structure estimation, including two parts: Concentric Nearest Neighbor Pairing (left) and Quadrant Filter (right). For Concentric Nearest Neighbor Pairing, we first select $c_i$'s nearest $r$ cameras for matching to guarantee stability of local bundle adjustment, then we select $w$ cameras from every $h+w$ camera based on distance order, forming a series of concentric circles. For quadrant filter of 2D case, camera $c_i$ is posed at the center of the coordinate system, pointing towards the y-axis. For other cameras (we show 12 cameras), the relative position and orientation to $c_i$ contains $4\times4=16$ states.
  • Figure 3: Example of edge weights.
  • Figure 4: Qualitative comparison of novel view synthesis results on the Waymo and KITTI, showcasing the Ground Truth alongside results from our GraphGS method, StreetSurf guo2023streetsurf, PVG chen2024periodic, and 3DGS kerbl3Dgaussians for comprehensive evaluation. Our approach yields closer fidelity to the Ground Truth, highlighting the effectiveness of our reconstruction method in various urban scene complexities.
  • Figure 5: Qualitative comparison of novel view synthesis in the Mill 19 large scene dataset meganerf, showcasing the Ground Truth alongside the results from our method and other state-of-the-art methods including Mega-NeRF meganerf, Switch-NeRF mi2023switchnerf, and 3DGS kerbl3Dgaussians.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Proposition 1