Table of Contents
Fetching ...

GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao

TL;DR

GauFRe tackles monocular dynamic scene reconstruction by representing scenes with deformable 3D Gaussian primitives in a canonical space, warped forward in time by a time-conditioned deformation field. It adds a GS-specific static component and an inductive bias-based initialization to separate static and dynamic regions, enabling efficient end-to-end optimization with a self-supervised rendering loss. The approach achieves competitive rendering quality while substantially reducing training time and enabling near real-time rendering (~96 FPS) on a single RTX 3090; it outperforms several NeRF-based and Gaussian-based baselines on synthetic and real-world dynamic datasets. Limitations remain, including overfitting risks and difficulties with large motions or thin structures, suggesting avenues for improved densification and motion modeling.

Abstract

We propose a method that achieves state-of-the-art rendering quality and efficiency on monocular dynamic scene reconstruction using deformable 3D Gaussians. Implicit deformable representations commonly model motion with a canonical space and time-dependent backward-warping deformation field. Our method, GauFRe, uses a forward-warping deformation to explicitly model non-rigid transformations of scene geometry. Specifically, we propose a template set of 3D Gaussians residing in a canonical space, and a time-dependent forward-warping deformation field to model dynamic objects. Additionally, we tailor a 3D Gaussian-specific static component supported by an inductive bias-aware initialization approach which allows the deformation field to focus on moving scene regions, improving the rendering of complex real-world motion. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Experiments show our method achieves competitive results and higher efficiency than both previous state-of-the-art NeRF and Gaussian-based methods. For real-world scenes, GauFRe can train in ~20 mins and offer 96 FPS real-time rendering on an RTX 3090 GPU. Project website: https://lynl7130.github.io/gaufre/index.html

GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

TL;DR

GauFRe tackles monocular dynamic scene reconstruction by representing scenes with deformable 3D Gaussian primitives in a canonical space, warped forward in time by a time-conditioned deformation field. It adds a GS-specific static component and an inductive bias-based initialization to separate static and dynamic regions, enabling efficient end-to-end optimization with a self-supervised rendering loss. The approach achieves competitive rendering quality while substantially reducing training time and enabling near real-time rendering (~96 FPS) on a single RTX 3090; it outperforms several NeRF-based and Gaussian-based baselines on synthetic and real-world dynamic datasets. Limitations remain, including overfitting risks and difficulties with large motions or thin structures, suggesting avenues for improved densification and motion modeling.

Abstract

We propose a method that achieves state-of-the-art rendering quality and efficiency on monocular dynamic scene reconstruction using deformable 3D Gaussians. Implicit deformable representations commonly model motion with a canonical space and time-dependent backward-warping deformation field. Our method, GauFRe, uses a forward-warping deformation to explicitly model non-rigid transformations of scene geometry. Specifically, we propose a template set of 3D Gaussians residing in a canonical space, and a time-dependent forward-warping deformation field to model dynamic objects. Additionally, we tailor a 3D Gaussian-specific static component supported by an inductive bias-aware initialization approach which allows the deformation field to focus on moving scene regions, improving the rendering of complex real-world motion. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Experiments show our method achieves competitive results and higher efficiency than both previous state-of-the-art NeRF and Gaussian-based methods. For real-world scenes, GauFRe can train in ~20 mins and offer 96 FPS real-time rendering on an RTX 3090 GPU. Project website: https://lynl7130.github.io/gaufre/index.html
Paper Structure (26 sections, 6 equations, 6 figures, 5 tables)

This paper contains 26 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: (a): GauFRe's dynamic scene reconstruction results on the NeRF-DS yan2023nerfds real-world dataset. (b): Our static component improves dynamic object rendering. (c): PSNR, real-time rendering performance, and optimization time (circle size) of state-of-the-art NeRF-based methods TiNeuVoxpark2021hypernerf and Gaussian-Splatting-based methods wu20234dgaussiansyang2023deformable on NeRF-DS at 480$\times$270 resolution.
  • Figure 2: An overview of our dynamic scene representation. At each time frame $t$, our method reconstructs the scene as a combination of static and deformable anisotropic 3D Gaussians. The features of the deformable Gaussians are optimized in a canonical space and warped into frame $t$ using a deformation field. The static Gaussians are optimized in world space.
  • Figure 3: Separate deformable and static regions improves quality in dynamic regions.Left to right: Ground truth, our method, deformable Gaussians with no static component, and the results of Wu et al.'s 4D Gaussians method wu20234dgaussians.
  • Figure 4: Visualizing the static and deformable 3D Gaussians optimized by our method.
  • Figure 5: Quantitative comparison of GauFRe and the baseline methods for test views from DNeRF pumarola2021d (400$\times$400) dataset. All methods reproduce the rough geometry, but sufficient sampling is necessary to reproduce the fine detail. Our approach can efficiently spread Gaussians to both static and dynamic regions to maximize quality, producing the sharpest image of all compared methods.
  • ...and 1 more figures