Table of Contents
Fetching ...

Learning Explicit Continuous Motion Representation for Dynamic Gaussian Splatting from Monocular Videos

Xuankai Zhang, Junjin Xiao, Shangwei Huang, Wei-shi Zheng, Qing Zhang

Abstract

We present an approach for high-quality dynamic Gaussian Splatting from monocular videos. To this end, we in this work go one step further beyond previous methods to explicitly model continuous position and orientation deformation of dynamic Gaussians, using an SE(3) B-spline motion bases with a compact set of control points. To improve computational efficiency while enhancing the ability to model complex motions, an adaptive control mechanism is devised to dynamically adjust the number of motion bases and control points. Besides, we develop a soft segment reconstruction strategy to mitigate long-interval motion interference, and employ a multi-view diffusion model to provide multi-view cues for avoiding overfitting to training views. Extensive experiments demonstrate that our method outperforms state-of-the-art methods in novel view synthesis. Our code is available at https://github.com/hhhddddddd/se3bsplinegs.

Learning Explicit Continuous Motion Representation for Dynamic Gaussian Splatting from Monocular Videos

Abstract

We present an approach for high-quality dynamic Gaussian Splatting from monocular videos. To this end, we in this work go one step further beyond previous methods to explicitly model continuous position and orientation deformation of dynamic Gaussians, using an SE(3) B-spline motion bases with a compact set of control points. To improve computational efficiency while enhancing the ability to model complex motions, an adaptive control mechanism is devised to dynamically adjust the number of motion bases and control points. Besides, we develop a soft segment reconstruction strategy to mitigate long-interval motion interference, and employ a multi-view diffusion model to provide multi-view cues for avoiding overfitting to training views. Extensive experiments demonstrate that our method outperforms state-of-the-art methods in novel view synthesis. Our code is available at https://github.com/hhhddddddd/se3bsplinegs.

Paper Structure

This paper contains 15 sections, 20 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Dynamic Gaussian Splatting from monocular videos. Our method synthesizes high-quality novel views from monocular videos, while the compared methods, e.g., MoSca mosca, HiMoR himor, and SplineGS splinegs, fail to faithfully reconstruct the dynamic windmill.
  • Figure 2: Overview of our method. We first initialize static Gaussians via depth reprojection and dynamic Gaussians from tracking points, by modeling their transformations with learnable SE(3) B-spline Motion Bases. We then adjust the number of motion bases and control points based on an adaptive control mechanism. Next, we employ a soft segment reconstruction strategy to fuse dynamic Gaussians at different reference timestamps to the observation timestamp, and further supplement the monocular video with scene-level multi-view cues derived from a multi-view diffusion model.
  • Figure 3: Visual comparison of novel view synthesis on the iPhone dataset iphone.
  • Figure 4: Visual comparison of novel view synthesis on the NVIDIA dataset. Note, images are cropped to highlight dynamic regions.
  • Figure 5: Visual ablation study on scenes from the iPhone and NVIDIA datasets.
  • ...and 9 more figures