Table of Contents
Fetching ...

Disentangled 4D Gaussian Splatting: Rendering High-Resolution Dynamic World at 343 FPS

Hao Feng, Hao Sun, Wei Xie, Zhi Zuo, Zhengzhe Liu

TL;DR

Disentangled4DGS tackles real-time dynamic novel view synthesis by decoupling temporal and spatial factors in 4D Gaussian representations and adopting a projection-first rendering pipeline. By modeling a 4D Gaussian with a 3D base, temporal scaling, and velocity of the mean, and projecting to ray space before slicing, the method avoids expensive 4D-to-3D recomputation and achieves high FPS with reduced storage. A flow-gradient guided consistency loss together with a temporal splitting strategy improves motion fidelity and reduces artifacts, especially at motion boundaries. Evaluations across Plenoptic, Google Immersive, HyperNeRF, and D-NeRF datasets show a new performance benchmark, delivering up to 343 FPS at $1352\times1014$ on an RTX 3090 and at least 4.5% storage reduction while outperforming prior 4DGS methods in both quality and speed.

Abstract

While dynamic novel view synthesis from 2D videos has seen progress, achieving efficient reconstruction and rendering of dynamic scenes remains a challenging task. In this paper, we introduce Disentangled 4D Gaussian Splatting (Disentangled4DGS), a novel representation and rendering pipeline that achieves real-time performance without compromising visual fidelity. Disentangled4DGS decouples the temporal and spatial components of 4D Gaussians, avoiding the need for slicing first and four-dimensional matrix calculations in prior methods. By projecting temporal and spatial deformations into dynamic 2D Gaussians and deferring temporal processing, we minimize redundant computations of 4DGS. Our approach also features a gradient-guided flow loss and temporal splitting strategy to reduce artifacts. Experiments demonstrate a significant improvement in rendering speed and quality, achieving 343 FPS when render 1352*1014 resolution images on a single RTX3090 while reducing storage requirements by at least 4.5%. Our approach sets a new benchmark for dynamic novel view synthesis, outperforming existing methods on both multi-view and monocular dynamic scene datasets.

Disentangled 4D Gaussian Splatting: Rendering High-Resolution Dynamic World at 343 FPS

TL;DR

Disentangled4DGS tackles real-time dynamic novel view synthesis by decoupling temporal and spatial factors in 4D Gaussian representations and adopting a projection-first rendering pipeline. By modeling a 4D Gaussian with a 3D base, temporal scaling, and velocity of the mean, and projecting to ray space before slicing, the method avoids expensive 4D-to-3D recomputation and achieves high FPS with reduced storage. A flow-gradient guided consistency loss together with a temporal splitting strategy improves motion fidelity and reduces artifacts, especially at motion boundaries. Evaluations across Plenoptic, Google Immersive, HyperNeRF, and D-NeRF datasets show a new performance benchmark, delivering up to 343 FPS at on an RTX 3090 and at least 4.5% storage reduction while outperforming prior 4DGS methods in both quality and speed.

Abstract

While dynamic novel view synthesis from 2D videos has seen progress, achieving efficient reconstruction and rendering of dynamic scenes remains a challenging task. In this paper, we introduce Disentangled 4D Gaussian Splatting (Disentangled4DGS), a novel representation and rendering pipeline that achieves real-time performance without compromising visual fidelity. Disentangled4DGS decouples the temporal and spatial components of 4D Gaussians, avoiding the need for slicing first and four-dimensional matrix calculations in prior methods. By projecting temporal and spatial deformations into dynamic 2D Gaussians and deferring temporal processing, we minimize redundant computations of 4DGS. Our approach also features a gradient-guided flow loss and temporal splitting strategy to reduce artifacts. Experiments demonstrate a significant improvement in rendering speed and quality, achieving 343 FPS when render 1352*1014 resolution images on a single RTX3090 while reducing storage requirements by at least 4.5%. Our approach sets a new benchmark for dynamic novel view synthesis, outperforming existing methods on both multi-view and monocular dynamic scene datasets.

Paper Structure

This paper contains 24 sections, 18 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: We present Disentangled 4D Gaussian Splatting, a highly-efficient approach that renders 1352 × 1014 resolution images at 343 FPS on an RTX 3090 in the Plenoptic Dataset9878989, surpassing previous approaches in both rendering quality and speed. Note that the x-axis is logarithmic scale.
  • Figure 2: Comparison between "slicing-first" 4D Gaussian Splatting and our Disentangled 4D Gaussian Splatting. The upper one is the slicing first 4D Gaussian Splatting method, which need to slice the 4D Gaussian into 3D Gaussian. This approach requires computing high-dimensional covariance matrices and performing repeated slicing and projection operations, leading to inefficiency and temporal discontinuity. In contrast, our "projection-first" disentangled formulation preserves temporal information throughout the rendering pipeline, enabling efficient rasterization and continuous, temporally coherent image synthesis.
  • Figure 3: Rendering pipeline of our Disentangled 4DGS. After initialization, we first project the 3D Gaussians and the velocity of mean orthogonally to the timeline, obtaining a 2D Gaussian sphere with velocity in ray space. Then the projected 2D Gaussians with velocity are sliced to obtain the static 2D Gaussian in ray space and utilize rasterization to produce the image. The gradients from loss are back-propagated to optimize the 4D Gaussians and guide the adaptive density control.
  • Figure 4: Visual comparisons on Plenoptic Video Dataset
  • Figure 5: Visual comparisons on Google Immersive Dataset
  • ...and 4 more figures