SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
Marko Mihajlovic, Sergey Prokudin, Siyu Tang, Robert Maier, Federica Bogo, Tony Tung, Edmond Boyer
TL;DR
The paper tackles sparse-view 3D and 4D reconstruction with 3D Gaussian Splatting by introducing SplatFields, a neural framework that imposes spatial autocorrelation on splat features via a tri-plane CNN generator and neural fields. Moran's $I$ is used to quantify the locality of splat attributes, with higher values correlating to better generalization. The method extends to 4D through time conditioning and a forward-flow field implemented with ResFields, achieving state-of-the-art performance among splatting-based methods in sparse-view dynamic scenes. Experiments on Blender, DTU, and Owlii demonstrate significant quality gains and real-time rendering after training, and code is released publicly. Limitations arise in extremely sparse and rapidly moving scenes, suggesting future work on incorporating learned priors to narrow gaps with NeRF-based methods.
Abstract
Digitizing 3D static scenes and 4D dynamic events from multi-view images has long been a challenge in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a practical and scalable reconstruction method, gaining popularity due to its impressive reconstruction quality, real-time rendering capabilities, and compatibility with widely used visualization tools. However, the method requires a substantial number of input views to achieve high-quality scene reconstruction, introducing a significant practical bottleneck. This challenge is especially severe in capturing dynamic scenes, where deploying an extensive camera array can be prohibitively costly. In this work, we identify the lack of spatial autocorrelation of splat features as one of the factors contributing to the suboptimal performance of the 3DGS technique in sparse reconstruction settings. To address the issue, we propose an optimization strategy that effectively regularizes splat features by modeling them as the outputs of a corresponding implicit neural field. This results in a consistent enhancement of reconstruction quality across various scenarios. Our approach effectively handles static and dynamic cases, as demonstrated by extensive testing across different setups and scene complexities.
