Table of Contents
Fetching ...

Gaussians on their Way: Wasserstein-Constrained 4D Gaussian Splatting with State-Space Modeling

Junli Deng, Yihao Luo

TL;DR

This work tackles dynamic scene rendering with 4D Gaussian Splatting, addressing temporal coherence and motion artifacts. It introduces a State Consistency Filter that fuses neural deformation-predicted observations with prior Gaussian states, and grounds Gaussian dynamics in Wasserstein geometry through Log/Exp maps on the Gaussian manifold, enabling smooth, physically plausible evolution. The approach combines a Kalman-like state update, Wasserstein distance regularization, and a neural deformation field to produce temporally coherent, high-quality renderings, validated on synthetic and real datasets with strong gains in PSNR, SSIM, and perceptual quality while maintaining real-time capabilities. Overall, the work offers a principled framework that unifies optimal transport with state-space estimation to advance dynamic 3D scene representation, with potential impact on real-time rendering, AR/VR, and robotics.

Abstract

Dynamic scene rendering has taken a leap forward with the rise of 4D Gaussian Splatting, but there's still one elusive challenge: how to make 3D Gaussians move through time as naturally as they would in the real world, all while keeping the motion smooth and consistent. In this paper, we unveil a fresh approach that blends state-space modeling with Wasserstein geometry, paving the way for a more fluid and coherent representation of dynamic scenes. We introduce a State Consistency Filter that merges prior predictions with the current observations, enabling Gaussians to stay true to their way over time. We also employ Wasserstein distance regularization to ensure smooth, consistent updates of Gaussian parameters, reducing motion artifacts. Lastly, we leverage Wasserstein geometry to capture both translational motion and shape deformations, creating a more physically plausible model for dynamic scenes. Our approach guides Gaussians along their natural way in the Wasserstein space, achieving smoother, more realistic motion and stronger temporal coherence. Experimental results show significant improvements in rendering quality and efficiency, outperforming current state-of-the-art techniques.

Gaussians on their Way: Wasserstein-Constrained 4D Gaussian Splatting with State-Space Modeling

TL;DR

This work tackles dynamic scene rendering with 4D Gaussian Splatting, addressing temporal coherence and motion artifacts. It introduces a State Consistency Filter that fuses neural deformation-predicted observations with prior Gaussian states, and grounds Gaussian dynamics in Wasserstein geometry through Log/Exp maps on the Gaussian manifold, enabling smooth, physically plausible evolution. The approach combines a Kalman-like state update, Wasserstein distance regularization, and a neural deformation field to produce temporally coherent, high-quality renderings, validated on synthetic and real datasets with strong gains in PSNR, SSIM, and perceptual quality while maintaining real-time capabilities. Overall, the work offers a principled framework that unifies optimal transport with state-space estimation to advance dynamic 3D scene representation, with potential impact on real-time rendering, AR/VR, and robotics.

Abstract

Dynamic scene rendering has taken a leap forward with the rise of 4D Gaussian Splatting, but there's still one elusive challenge: how to make 3D Gaussians move through time as naturally as they would in the real world, all while keeping the motion smooth and consistent. In this paper, we unveil a fresh approach that blends state-space modeling with Wasserstein geometry, paving the way for a more fluid and coherent representation of dynamic scenes. We introduce a State Consistency Filter that merges prior predictions with the current observations, enabling Gaussians to stay true to their way over time. We also employ Wasserstein distance regularization to ensure smooth, consistent updates of Gaussian parameters, reducing motion artifacts. Lastly, we leverage Wasserstein geometry to capture both translational motion and shape deformations, creating a more physically plausible model for dynamic scenes. Our approach guides Gaussians along their natural way in the Wasserstein space, achieving smoother, more realistic motion and stronger temporal coherence. Experimental results show significant improvements in rendering quality and efficiency, outperforming current state-of-the-art techniques.

Paper Structure

This paper contains 26 sections, 18 equations, 7 figures, 7 tables, 3 algorithms.

Figures (7)

  • Figure 1: Overview of our proposed method. Starting from a Structure-from-Motion (SFM) point cloud, we initialize canonical 3D Gaussians including position $\boldsymbol{\mu}^c$, rotation $\mathbf{R}^c$, and scale $\mathbf{S}^c$ parameters. The deform network predicts these parameters $(\boldsymbol{\mu},\mathbf{R},\mathbf{S})$ at different timestamps $\gamma(t)$. In the Wasserstein space, our state-updating mechanism merges predictions with observations, while ensuring temporal coherence between frames by regularization. The merged Gaussians are then rendered via differentiable rasterization.
  • Figure 2: Gaussian dynamics modeling in Wasserstein space. The velocity $v_t$ is computed via logarithmic map between $\mathcal{N}_{t-1}$ and $\mathcal{N}_t$, then used to predict $\mathcal{N}^P_{t+1}$ through exponential map. Gray regions show log/exp map operations in the wasserstein space; the blue region represents current state.
  • Figure 3: Qualitative results on the synthetic dataset. Zoom in for details.
  • Figure 4: Qualitative results on the real-world dataset. Zoom in for details.
  • Figure 5: Optical Flow Visualization. Our method naturally derives a speed field by computing 3D motions for all Gaussian points and projecting them to 2D optical flow. Left: Raw observed flow with noticeable noise. Middle: Predicted flow with Filter showing clearer motion boundaries and better dynamic-static separation. Right: Residual map indicating the consistency between observation and prediction.
  • ...and 2 more figures