Table of Contents
Fetching ...

H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting

Bing He, Yunuo Chen, Guo Lu, Qi Wang, Qunshan Gu, Rong Xie, Li Song, Wenjun Zhang

TL;DR

H3D-DGS introduces heterogeneous 3D control points (H3D) to decouple observable and unobservable 3D motion for deformable 3D Gaussian splatting. By deriving observable motion from optical flow via a local ray framework and learning only the unobservable components, the framework achieves faster convergence and robust performance on real-world dynamic scenes. The method is integrated into a streaming pipeline with 3D segmentation, residual compensation, and GoS-based updates, showing state-of-the-art results on Neu3DV and CMU-Panoptic datasets with convergence in around 100 iterations and about 2 seconds per frame on an RTX 4070. This approach yields compact motion representations, improved reconstruction fidelity, and practical streaming capabilities, while acknowledging limitations such as dependence on initial static reconstructions and multi-view inputs for now.

Abstract

Dynamic scene reconstruction poses a persistent challenge in 3D vision. Deformable 3D Gaussian Splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity. This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion. Scene motion is defined as the collective movement of all Gaussian points, and for compactness, existing approaches commonly adopt implicit neural fields or sparse control points. However, these methods predominantly rely on gradient-based optimization for all motion information. Due to the high degree of freedom, they struggle to converge on real-world datasets exhibiting complex motion. To preserve the compactness of motion representation and address convergence challenges, this paper proposes heterogeneous 3D control points, termed \textbf{H3D control points}, whose attributes are obtained using a hybrid strategy combining optical flow back-projection and gradient-based methods. This design decouples directly observable motion components from those that are geometrically occluded. Specifically, components of 3D motion that project onto the image plane are directly acquired via optical flow back projection, while unobservable portions are refined through gradient-based optimization. Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques. Remarkably, our method converges within just 100 iterations and achieves a per-frame processing speed of 2 seconds on a single NVIDIA RTX 4070 GPU.

H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting

TL;DR

H3D-DGS introduces heterogeneous 3D control points (H3D) to decouple observable and unobservable 3D motion for deformable 3D Gaussian splatting. By deriving observable motion from optical flow via a local ray framework and learning only the unobservable components, the framework achieves faster convergence and robust performance on real-world dynamic scenes. The method is integrated into a streaming pipeline with 3D segmentation, residual compensation, and GoS-based updates, showing state-of-the-art results on Neu3DV and CMU-Panoptic datasets with convergence in around 100 iterations and about 2 seconds per frame on an RTX 4070. This approach yields compact motion representations, improved reconstruction fidelity, and practical streaming capabilities, while acknowledging limitations such as dependence on initial static reconstructions and multi-view inputs for now.

Abstract

Dynamic scene reconstruction poses a persistent challenge in 3D vision. Deformable 3D Gaussian Splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity. This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion. Scene motion is defined as the collective movement of all Gaussian points, and for compactness, existing approaches commonly adopt implicit neural fields or sparse control points. However, these methods predominantly rely on gradient-based optimization for all motion information. Due to the high degree of freedom, they struggle to converge on real-world datasets exhibiting complex motion. To preserve the compactness of motion representation and address convergence challenges, this paper proposes heterogeneous 3D control points, termed \textbf{H3D control points}, whose attributes are obtained using a hybrid strategy combining optical flow back-projection and gradient-based methods. This design decouples directly observable motion components from those that are geometrically occluded. Specifically, components of 3D motion that project onto the image plane are directly acquired via optical flow back projection, while unobservable portions are refined through gradient-based optimization. Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques. Remarkably, our method converges within just 100 iterations and achieves a per-frame processing speed of 2 seconds on a single NVIDIA RTX 4070 GPU.
Paper Structure (36 sections, 14 equations, 13 figures, 6 tables)

This paper contains 36 sections, 14 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Method Comparison. Rep. and Manip. are abbreviations for representation and manipulation, respectively.
  • Figure 2: H3D Control Points. To predict dense 3D motion in a sparse manner, we propose H3D control points containing local translation and rotation information. Unlike previous works which learn all motion information with gradient-based method, we exploit 2D motion priors derived from the optical flow. Both translation and rotation are divided into projected observable part and learnable unobservable part. We model the localized light as parallel rays and make a detailed derivation.
  • Figure 3: Auxiliary Diagrams for Local Motion Mapping. (a) An illustration of angles, points, and rays within the neighborhood of $\mathbf{x}_0$. (b) A quantitative depiction of motion projection. (c) A comparison between Gaussians distributed on the ground truth surface and control points located on a biased surface. (d) Illustration for grid sampling. Grid sampling is performed independently for each camera to provide a 2D motion prior for H3D control points.
  • Figure 4: Streaming workflow. The workflow starts by segmenting the scene into a static background and moving objects using 3D segmentation algorithm. Optical flow is then applied to generate H3D control points. Motion-related attributes of the Gaussians are manipulated on an object-wise basis. To prevent reconstruction failures from error accumulation, Gaussian attributes are periodically updated in a keyframe manner, capturing additional scene information as attribute residuals of the Gaussians.
  • Figure 5: Left: Frame 20 of the "coffee_martini" sequence from the Neu3DV dataset. Right: Frame 74 of the "softball" sequence from the CMU-Panoptic dataset.
  • ...and 8 more figures