Table of Contents
Fetching ...

Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis

Niluthpol Chowdhury Mithun, Tuan Pham, Qiao Wang, Ben Southall, Kshitij Minhas, Bogdan Matei, Stephan Mandt, Supun Samarasekera, Rakesh Kumar

TL;DR

The paper addresses robust, large-scale unconstrained 3D reconstruction and novel view synthesis in the presence of sparse input views and occlusions. It proposes GS-Diff, which couples 3D Gaussian Splatting with a multi-view diffusion prior to synthesize diffusion-augmented pseudo-views conditioned on nearby inputs, enabling more constrained optimization; Gaussians are represented by centers $\mu \in \mathbb{R}^3$, covariance $\Sigma$, opacity $\alpha$, and SH colors $c$. It adds monocular depth priors, appearance embeddings, dynamic-object handling, anisotropy regularization, and advanced rasterization to handle real-world variability. Experiments on four benchmarks show significant improvements over state-of-the-art baselines, especially under sparse-view conditions. This work advances scalable, high-fidelity 3D reconstruction and view synthesis in unconstrained environments, enabling more reliable real-world deployment of large-scale 3D capture systems.

Abstract

Recent advancements in 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have achieved impressive results in real-time 3D reconstruction and novel view synthesis. However, these methods struggle in large-scale, unconstrained environments where sparse and uneven input coverage, transient occlusions, appearance variability, and inconsistent camera settings lead to degraded quality. We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address these limitations. By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones, enabling robust optimization even with sparse data. GS-Diff further integrates several enhancements, including appearance embedding, monocular depth priors, dynamic object modeling, anisotropy regularization, and advanced rasterization techniques, to tackle geometric and photometric challenges in real-world settings. Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.

Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis

TL;DR

The paper addresses robust, large-scale unconstrained 3D reconstruction and novel view synthesis in the presence of sparse input views and occlusions. It proposes GS-Diff, which couples 3D Gaussian Splatting with a multi-view diffusion prior to synthesize diffusion-augmented pseudo-views conditioned on nearby inputs, enabling more constrained optimization; Gaussians are represented by centers , covariance , opacity , and SH colors . It adds monocular depth priors, appearance embeddings, dynamic-object handling, anisotropy regularization, and advanced rasterization to handle real-world variability. Experiments on four benchmarks show significant improvements over state-of-the-art baselines, especially under sparse-view conditions. This work advances scalable, high-fidelity 3D reconstruction and view synthesis in unconstrained environments, enabling more reliable real-world deployment of large-scale 3D capture systems.

Abstract

Recent advancements in 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have achieved impressive results in real-time 3D reconstruction and novel view synthesis. However, these methods struggle in large-scale, unconstrained environments where sparse and uneven input coverage, transient occlusions, appearance variability, and inconsistent camera settings lead to degraded quality. We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address these limitations. By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones, enabling robust optimization even with sparse data. GS-Diff further integrates several enhancements, including appearance embedding, monocular depth priors, dynamic object modeling, anisotropy regularization, and advanced rasterization techniques, to tackle geometric and photometric challenges in real-world settings. Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.

Paper Structure

This paper contains 10 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Brief Illustration of the Proposed GS-Diff Approach.
  • Figure 2: Iterative workflow of the integrated diffusion process with the proposed GS-Diff pipeline.
  • Figure 3: Comparison on the ULTRRA CM-2601 set (row-1), Photo Tourism Brandenburg Gate set (row-2), and WRIVA-AIDI 25 image set (row-3). Baselines (Left), Ours (Middle), GT (Right).