Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis

Niluthpol Chowdhury Mithun; Tuan Pham; Qiao Wang; Ben Southall; Kshitij Minhas; Bogdan Matei; Stephan Mandt; Supun Samarasekera; Rakesh Kumar

Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis

Niluthpol Chowdhury Mithun, Tuan Pham, Qiao Wang, Ben Southall, Kshitij Minhas, Bogdan Matei, Stephan Mandt, Supun Samarasekera, Rakesh Kumar

TL;DR

The paper addresses robust, large-scale unconstrained 3D reconstruction and novel view synthesis in the presence of sparse input views and occlusions. It proposes GS-Diff, which couples 3D Gaussian Splatting with a multi-view diffusion prior to synthesize diffusion-augmented pseudo-views conditioned on nearby inputs, enabling more constrained optimization; Gaussians are represented by centers $\mu \in \mathbb{R}^3$, covariance $\Sigma$, opacity $\alpha$, and SH colors $c$. It adds monocular depth priors, appearance embeddings, dynamic-object handling, anisotropy regularization, and advanced rasterization to handle real-world variability. Experiments on four benchmarks show significant improvements over state-of-the-art baselines, especially under sparse-view conditions. This work advances scalable, high-fidelity 3D reconstruction and view synthesis in unconstrained environments, enabling more reliable real-world deployment of large-scale 3D capture systems.

Abstract

Recent advancements in 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have achieved impressive results in real-time 3D reconstruction and novel view synthesis. However, these methods struggle in large-scale, unconstrained environments where sparse and uneven input coverage, transient occlusions, appearance variability, and inconsistent camera settings lead to degraded quality. We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address these limitations. By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones, enabling robust optimization even with sparse data. GS-Diff further integrates several enhancements, including appearance embedding, monocular depth priors, dynamic object modeling, anisotropy regularization, and advanced rasterization techniques, to tackle geometric and photometric challenges in real-world settings. Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.

Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis

TL;DR

Abstract

Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)