Table of Contents
Fetching ...

SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis

Jipeng Lyu, Jiahua Dong, Yu-Xiong Wang

TL;DR

SCas4D tackles persistent dynamic scene modeling and 4D novel-view synthesis by leveraging hierarchical patterns in 3D Gaussian Splatting. It introduces a cascaded, coarse-to-fine deformation framework that clusters Gaussians into multiple layers and composes their deformations via $\Theta_t = D_t(\Theta_{t-1})$, enabling online reconstruction with about 100 iterations per frame and substantial speedups over state-of-the-art. The approach delivers competitive rendering quality and dense point tracking while offering self-supervised articulated object segmentation, demonstrated on real and accelerated synthetic datasets. This work advances practical online dynamic scene capture for applications in AR/VR, robotics, and autonomous systems.

Abstract

Persistent dynamic scene modeling for tracking and novel-view synthesis remains challenging due to the difficulty of capturing accurate deformations while maintaining computational efficiency. We propose SCas4D, a cascaded optimization framework that leverages structural patterns in 3D Gaussian Splatting for dynamic scenes. The key idea is that real-world deformations often exhibit hierarchical patterns, where groups of Gaussians share similar transformations. By progressively refining deformations from coarse part-level to fine point-level, SCas4D achieves convergence within 100 iterations per time frame and produces results comparable to existing methods with only one-twentieth of the training iterations. The approach also demonstrates effectiveness in self-supervised articulated object segmentation, novel view synthesis, and dense point tracking tasks.

SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis

TL;DR

SCas4D tackles persistent dynamic scene modeling and 4D novel-view synthesis by leveraging hierarchical patterns in 3D Gaussian Splatting. It introduces a cascaded, coarse-to-fine deformation framework that clusters Gaussians into multiple layers and composes their deformations via , enabling online reconstruction with about 100 iterations per frame and substantial speedups over state-of-the-art. The approach delivers competitive rendering quality and dense point tracking while offering self-supervised articulated object segmentation, demonstrated on real and accelerated synthetic datasets. This work advances practical online dynamic scene capture for applications in AR/VR, robotics, and autonomous systems.

Abstract

Persistent dynamic scene modeling for tracking and novel-view synthesis remains challenging due to the difficulty of capturing accurate deformations while maintaining computational efficiency. We propose SCas4D, a cascaded optimization framework that leverages structural patterns in 3D Gaussian Splatting for dynamic scenes. The key idea is that real-world deformations often exhibit hierarchical patterns, where groups of Gaussians share similar transformations. By progressively refining deformations from coarse part-level to fine point-level, SCas4D achieves convergence within 100 iterations per time frame and produces results comparable to existing methods with only one-twentieth of the training iterations. The approach also demonstrates effectiveness in self-supervised articulated object segmentation, novel view synthesis, and dense point tracking tasks.

Paper Structure

This paper contains 26 sections, 11 equations, 15 figures, 8 tables, 1 algorithm.

Figures (15)

  • Figure 1: Our method achieves satisfying rendering results with 100 training iterations per frame. Leveraging learned deformation information, we also demonstrate successful articulated object segmentation.
  • Figure 2: Our method first utilizes the Gaussians from the previous frame $t-1$ and the new inputs for frame $t$ to learn the deformation $D$ between these two frames. These Gaussians are organized into cascaded clusters with $K$ layers. For each cluster layer, we learn a deformation function. Finally, the deformation $D$ of each Gaussian is obtained by nesting these deformation functions.
  • Figure 3: Illustration of the deformation function parameters: Rotation (second), translation (third), and scaling (fourth) applied to a cluster. Initial state (first).
  • Figure 4: Visual comparison of rendering results on FastParticle after 100 iterations per frame training.
  • Figure 5: Articulated objects segmentation results.
  • ...and 10 more figures