Table of Contents
Fetching ...

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh

TL;DR

The paper tackles dynamic scene reconstruction with 3D Gaussian Splatting by replacing traditional coordinate-based deformation with per-Gaussian latent embeddings conditioned on frame-specific temporal embeddings. It introduces a coarse-fine deformation strategy to capture slow versus fast motions and adds a local smoothness regularization to promote coherent deformations among neighboring Gaussians, improving detail in dynamic regions. Across multiple datasets, the approach yields clearer dynamic details and faster rendering than prior deformable Gaussian methods, with ablations validating the necessity of both deformation components and the embedding-based design. While effective, the method encounters challenges with casually captured monocular videos, indicating future work is needed to incorporate priors for such settings and further improve robustness.

Abstract

As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames for representing a dynamic scene. However, previous works fail to accurately reconstruct complex dynamic scenes. We attribute the failure to the design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce a local smoothness regularization for per-Gaussian embedding to improve the details in dynamic regions. Project page: https://jeongminb.github.io/e-d3dgs/

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

TL;DR

The paper tackles dynamic scene reconstruction with 3D Gaussian Splatting by replacing traditional coordinate-based deformation with per-Gaussian latent embeddings conditioned on frame-specific temporal embeddings. It introduces a coarse-fine deformation strategy to capture slow versus fast motions and adds a local smoothness regularization to promote coherent deformations among neighboring Gaussians, improving detail in dynamic regions. Across multiple datasets, the approach yields clearer dynamic details and faster rendering than prior deformable Gaussian methods, with ablations validating the necessity of both deformation components and the embedding-based design. While effective, the method encounters challenges with casually captured monocular videos, indicating future work is needed to incorporate priors for such settings and further improve robustness.

Abstract

As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames for representing a dynamic scene. However, previous works fail to accurately reconstruct complex dynamic scenes. We attribute the failure to the design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce a local smoothness regularization for per-Gaussian embedding to improve the details in dynamic regions. Project page: https://jeongminb.github.io/e-d3dgs/
Paper Structure (31 sections, 5 equations, 15 figures, 9 tables)

This paper contains 31 sections, 5 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: Overview. (a) Existing deformable 3D Gaussian Splatting methods show blurry results in complex dynamic scenes, even with deformation fields using finer feature grids. (b) Our model solves the problem by employing per-Gaussian latent embeddings to predict deformations for each Gaussian and achieves clearer results.
  • Figure 2: Framework. Existing coordinate-based network methods struggle to represent complex dynamic scenes. To this end, we define per-Gaussian deformation. (a) Firstly, we assign a latent embedding for each Gaussian. Additionally, we introduce coarse and fine temporal embeddings to represent the slow and fast state of the dynamic scene. (b) By employing two decoders that take per-Gaussian latent embeddings along with coarse and fine temporal embeddings as input, we estimate slow or large changes and fast or detailed changes to model the final deformation, respectively. (c) Finally, we introduce a local smoothness regularization so that the embeddings of neighboring Gaussians are similar.
  • Figure 3: Qualitative comparisons on the Neural 3D Video dataset.
  • Figure 4: Qualitative comparisons on the Technicolor dataset.
  • Figure 5: Qualitative comparisons on the HyperNeRF dataset.
  • ...and 10 more figures