Table of Contents
Fetching ...

Geometry-Consistent 4D Gaussian Splatting for Sparse-Input Dynamic View Synthesis

Yiwei Li, Jiannong Cao, Penghui Ruan, Divya Saxena, Songye Zhu, Yinfeng Cao

TL;DR

This work tackles sparse-input dynamic view synthesis by introducing Geometry-Consistent 4D Gaussian Splatting (GC-4DGS). It adds a dynamic consistency checking mechanism to fuse robust multi-view depths and a global-local depth regularization to align monocular priors with 4D geometry, yielding coherent depth and appearance in dynamic scenes. The approach achieves state-of-the-art rendering quality on N3DV and Technicolor with only three input views and remains deployable on edge devices, highlighting strong practical potential for AIoT applications. Overall, GC-4DGS enables real-time, high-fidelity dynamic view synthesis under sparse-view constraints by tightly integrating geometry priors with 4D Gaussian optimization.

Abstract

Gaussian Splatting has been considered as a novel way for view synthesis of dynamic scenes, which shows great potential in AIoT applications such as digital twins. However, recent dynamic Gaussian Splatting methods significantly degrade when only sparse input views are available, limiting their applicability in practice. The issue arises from the incoherent learning of 4D geometry as input views decrease. This paper presents GC-4DGS, a novel framework that infuses geometric consistency into 4D Gaussian Splatting (4DGS), offering real-time and high-quality dynamic scene rendering from sparse input views. While learning-based Multi-View Stereo (MVS) and monocular depth estimators (MDEs) provide geometry priors, directly integrating these with 4DGS yields suboptimal results due to the ill-posed nature of sparse-input 4D geometric optimization. To address these problems, we introduce a dynamic consistency checking strategy to reduce estimation uncertainties of MVS across spacetime. Furthermore, we propose a global-local depth regularization approach to distill spatiotemporal-consistent geometric information from monocular depths, thereby enhancing the coherent geometry and appearance learning within the 4D volume. Extensive experiments on the popular N3DV and Technicolor datasets validate the effectiveness of GC-4DGS in rendering quality without sacrificing efficiency. Notably, our method outperforms RF-DeRF, the latest dynamic radiance field tailored for sparse-input dynamic view synthesis, and the original 4DGS by 2.62dB and 1.58dB in PSNR, respectively, with seamless deployability on resource-constrained IoT edge devices.

Geometry-Consistent 4D Gaussian Splatting for Sparse-Input Dynamic View Synthesis

TL;DR

This work tackles sparse-input dynamic view synthesis by introducing Geometry-Consistent 4D Gaussian Splatting (GC-4DGS). It adds a dynamic consistency checking mechanism to fuse robust multi-view depths and a global-local depth regularization to align monocular priors with 4D geometry, yielding coherent depth and appearance in dynamic scenes. The approach achieves state-of-the-art rendering quality on N3DV and Technicolor with only three input views and remains deployable on edge devices, highlighting strong practical potential for AIoT applications. Overall, GC-4DGS enables real-time, high-fidelity dynamic view synthesis under sparse-view constraints by tightly integrating geometry priors with 4D Gaussian optimization.

Abstract

Gaussian Splatting has been considered as a novel way for view synthesis of dynamic scenes, which shows great potential in AIoT applications such as digital twins. However, recent dynamic Gaussian Splatting methods significantly degrade when only sparse input views are available, limiting their applicability in practice. The issue arises from the incoherent learning of 4D geometry as input views decrease. This paper presents GC-4DGS, a novel framework that infuses geometric consistency into 4D Gaussian Splatting (4DGS), offering real-time and high-quality dynamic scene rendering from sparse input views. While learning-based Multi-View Stereo (MVS) and monocular depth estimators (MDEs) provide geometry priors, directly integrating these with 4DGS yields suboptimal results due to the ill-posed nature of sparse-input 4D geometric optimization. To address these problems, we introduce a dynamic consistency checking strategy to reduce estimation uncertainties of MVS across spacetime. Furthermore, we propose a global-local depth regularization approach to distill spatiotemporal-consistent geometric information from monocular depths, thereby enhancing the coherent geometry and appearance learning within the 4D volume. Extensive experiments on the popular N3DV and Technicolor datasets validate the effectiveness of GC-4DGS in rendering quality without sacrificing efficiency. Notably, our method outperforms RF-DeRF, the latest dynamic radiance field tailored for sparse-input dynamic view synthesis, and the original 4DGS by 2.62dB and 1.58dB in PSNR, respectively, with seamless deployability on resource-constrained IoT edge devices.

Paper Structure

This paper contains 15 sections, 13 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Our Geometry-Consistent 4D Gaussian Splatting (GC-4DGS) achieves high-fidelity rendering quality with only 3 input views. (a) Existing dynamic Gaussian Splatting methods, e.g., 4DGaussians wu20244d, learn incorrect 4D geometry from sparse training views. (b) GC-4DGS solves this issue by learning consistent geometry from both MVS and monocular depths, achieving realistic appearance and coherent geometry of dynamic scenes.
  • Figure 2: Framework Overview. (a) We introduce a dynamic consistency checking strategy to fuse view-consistent metric depths from a learning-based MVS method, which is then employed to obtain point clouds for Gaussian initialization and to supervise the learning of 4D geometry. (b) We propose a global-local depth regularization method to distill robust geometry information from a pre-trained MDE, which ensures consistent depth ranking while maintaining local patch smoothness. The optimization is conducted through temporal slicing, differentiable rendering, and color and depth supervision.
  • Figure 3: Global-Local Depth Regularization. For a randomly selected pixel pair $(\boldsymbol{u}, \boldsymbol{u'})$, we enforce that the relative order in $D_{ren}$ is consistent with that in $D_{mde}$. Additionally, we encourage local smoothness by applying absolute supervision on normalized depth patches. These methods enable our model to distill robust geometric information from the monocular depth maps, thereby facilitating the learning of 4D geometry.
  • Figure 4: Qualitative results on N3DV Dataset with 3 input views. GC-4DGS achieves consistent improvement in both static and dynamic regions when compared to other state-of-the-art dynamic radiance fields.
  • Figure 5: Qualitative results on Technicolor Dataset with 3 input views. HyperReel attal2023hyperreel and 4DGaussians wu20244d produce significantly distorted images, while STG li2024spacetime struggles to capture complex dynamics. 4DGS yang2024real and E-D3DGS bae2025per are prone to overfit in areas with limited observations and produce blurred results.
  • ...and 2 more figures