Geometry-Consistent 4D Gaussian Splatting for Sparse-Input Dynamic View Synthesis
Yiwei Li, Jiannong Cao, Penghui Ruan, Divya Saxena, Songye Zhu, Yinfeng Cao
TL;DR
This work tackles sparse-input dynamic view synthesis by introducing Geometry-Consistent 4D Gaussian Splatting (GC-4DGS). It adds a dynamic consistency checking mechanism to fuse robust multi-view depths and a global-local depth regularization to align monocular priors with 4D geometry, yielding coherent depth and appearance in dynamic scenes. The approach achieves state-of-the-art rendering quality on N3DV and Technicolor with only three input views and remains deployable on edge devices, highlighting strong practical potential for AIoT applications. Overall, GC-4DGS enables real-time, high-fidelity dynamic view synthesis under sparse-view constraints by tightly integrating geometry priors with 4D Gaussian optimization.
Abstract
Gaussian Splatting has been considered as a novel way for view synthesis of dynamic scenes, which shows great potential in AIoT applications such as digital twins. However, recent dynamic Gaussian Splatting methods significantly degrade when only sparse input views are available, limiting their applicability in practice. The issue arises from the incoherent learning of 4D geometry as input views decrease. This paper presents GC-4DGS, a novel framework that infuses geometric consistency into 4D Gaussian Splatting (4DGS), offering real-time and high-quality dynamic scene rendering from sparse input views. While learning-based Multi-View Stereo (MVS) and monocular depth estimators (MDEs) provide geometry priors, directly integrating these with 4DGS yields suboptimal results due to the ill-posed nature of sparse-input 4D geometric optimization. To address these problems, we introduce a dynamic consistency checking strategy to reduce estimation uncertainties of MVS across spacetime. Furthermore, we propose a global-local depth regularization approach to distill spatiotemporal-consistent geometric information from monocular depths, thereby enhancing the coherent geometry and appearance learning within the 4D volume. Extensive experiments on the popular N3DV and Technicolor datasets validate the effectiveness of GC-4DGS in rendering quality without sacrificing efficiency. Notably, our method outperforms RF-DeRF, the latest dynamic radiance field tailored for sparse-input dynamic view synthesis, and the original 4DGS by 2.62dB and 1.58dB in PSNR, respectively, with seamless deployability on resource-constrained IoT edge devices.
