Table of Contents
Fetching ...

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

Marko Mihajlovic, Sergey Prokudin, Siyu Tang, Robert Maier, Federica Bogo, Tony Tung, Edmond Boyer

TL;DR

The paper tackles sparse-view 3D and 4D reconstruction with 3D Gaussian Splatting by introducing SplatFields, a neural framework that imposes spatial autocorrelation on splat features via a tri-plane CNN generator and neural fields. Moran's $I$ is used to quantify the locality of splat attributes, with higher values correlating to better generalization. The method extends to 4D through time conditioning and a forward-flow field implemented with ResFields, achieving state-of-the-art performance among splatting-based methods in sparse-view dynamic scenes. Experiments on Blender, DTU, and Owlii demonstrate significant quality gains and real-time rendering after training, and code is released publicly. Limitations arise in extremely sparse and rapidly moving scenes, suggesting future work on incorporating learned priors to narrow gaps with NeRF-based methods.

Abstract

Digitizing 3D static scenes and 4D dynamic events from multi-view images has long been a challenge in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a practical and scalable reconstruction method, gaining popularity due to its impressive reconstruction quality, real-time rendering capabilities, and compatibility with widely used visualization tools. However, the method requires a substantial number of input views to achieve high-quality scene reconstruction, introducing a significant practical bottleneck. This challenge is especially severe in capturing dynamic scenes, where deploying an extensive camera array can be prohibitively costly. In this work, we identify the lack of spatial autocorrelation of splat features as one of the factors contributing to the suboptimal performance of the 3DGS technique in sparse reconstruction settings. To address the issue, we propose an optimization strategy that effectively regularizes splat features by modeling them as the outputs of a corresponding implicit neural field. This results in a consistent enhancement of reconstruction quality across various scenarios. Our approach effectively handles static and dynamic cases, as demonstrated by extensive testing across different setups and scene complexities.

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

TL;DR

The paper tackles sparse-view 3D and 4D reconstruction with 3D Gaussian Splatting by introducing SplatFields, a neural framework that imposes spatial autocorrelation on splat features via a tri-plane CNN generator and neural fields. Moran's is used to quantify the locality of splat attributes, with higher values correlating to better generalization. The method extends to 4D through time conditioning and a forward-flow field implemented with ResFields, achieving state-of-the-art performance among splatting-based methods in sparse-view dynamic scenes. Experiments on Blender, DTU, and Owlii demonstrate significant quality gains and real-time rendering after training, and code is released publicly. Limitations arise in extremely sparse and rapidly moving scenes, suggesting future work on incorporating learned priors to narrow gaps with NeRF-based methods.

Abstract

Digitizing 3D static scenes and 4D dynamic events from multi-view images has long been a challenge in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a practical and scalable reconstruction method, gaining popularity due to its impressive reconstruction quality, real-time rendering capabilities, and compatibility with widely used visualization tools. However, the method requires a substantial number of input views to achieve high-quality scene reconstruction, introducing a significant practical bottleneck. This challenge is especially severe in capturing dynamic scenes, where deploying an extensive camera array can be prohibitively costly. In this work, we identify the lack of spatial autocorrelation of splat features as one of the factors contributing to the suboptimal performance of the 3DGS technique in sparse reconstruction settings. To address the issue, we propose an optimization strategy that effectively regularizes splat features by modeling them as the outputs of a corresponding implicit neural field. This results in a consistent enhancement of reconstruction quality across various scenarios. Our approach effectively handles static and dynamic cases, as demonstrated by extensive testing across different setups and scene complexities.
Paper Structure (16 sections, 15 equations, 5 figures, 12 tables)

This paper contains 16 sections, 15 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: SplatFields regularizes 3D Gaussian Splatting (3DGS) GaussianSplatting by predicting the splat features and locations via neural fields to improve the reconstruction under unconstrained sparse views. We measure spatial autocorrelation (Moran's I moran1950notes) of splat features in the local neighborhoods to assess their similarity and observe that better reconstruction quality achieved by our method corresponds to higher Moran's I. The figure presents the results of a static reconstruction from ten calibrated images from Blender dataset mildenhall2020nerf. Metrics are reported on the full test set; the rendered view is a novel view.
  • Figure 2: Overview. SplatFields takes as input a point cloud (e.g., initialized from SfM schoenberger2016sfm), for which it models the geometric (position $\mathbf{p}_k$, scale $\mathbf{s}_k$, rotation $\mathbf{O}_k$) and appearance attributes (color $\mathbf{c}_k$, opacity $\alpha_k$). These attributes represent the point set as 3D splats that are then rendered with the 3DGS rasterizer GaussianSplatting. First, the point location set $\{\mathbf{p}_k\in \mathbb{R}^3\}_{k=1}^K$ is encoded into features $\{\mathbf{f}_k\}_{k=1}^{K}$ by sampling the tri-plane representation generated by a CNN generator $g_\theta$ to provide a deep structural prior ulyanov2018deep on the feature values. These values are then propagated through a deformation MLP $f_\Theta$ to refine the point locations $\hat{\mathbf{p}}_k$. The new point set, along with the features, is then propagated through a series of compact neural fields to predict the properties of rendering primitives $\{\mathcal{G}_k\}_{k=1}^K$ that are rendered with respect to arbitrary viewpoints. During the optimization, we adopt the adaptive density control GaussianSplatting to periodically prune and densify the point set. SplatFields seamlessly adapts to 4D reconstruction by conditioning neural fields on the time step $t$ and introducing an extra time-conditioned flow field. Gray blocks indicate learnable modules.
  • Figure 3: Static reconstruction of Blender mildenhall2020nerf scenes for the setup from Tab. \ref{['tab:exp_static_blender']}
  • Figure 4: Monocular reconstruction of dynamic sequences from the NeRF-DS dataset nerfds with recent state-of-the-art methods. The forward slash in FPS indicates the rendering speed without the neural network inference when the rendering primitives are extracted and stored for each frame vs. with the neural network inference
  • Figure 5: Three-view reconstruction on DTU DTU; PSNR are averaged across all 15 scenes. See Tab. \ref{['tab:app:dtu_comparison']} for the individualized scores.