H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting
Bing He, Yunuo Chen, Guo Lu, Qi Wang, Qunshan Gu, Rong Xie, Li Song, Wenjun Zhang
TL;DR
H3D-DGS introduces heterogeneous 3D control points (H3D) to decouple observable and unobservable 3D motion for deformable 3D Gaussian splatting. By deriving observable motion from optical flow via a local ray framework and learning only the unobservable components, the framework achieves faster convergence and robust performance on real-world dynamic scenes. The method is integrated into a streaming pipeline with 3D segmentation, residual compensation, and GoS-based updates, showing state-of-the-art results on Neu3DV and CMU-Panoptic datasets with convergence in around 100 iterations and about 2 seconds per frame on an RTX 4070. This approach yields compact motion representations, improved reconstruction fidelity, and practical streaming capabilities, while acknowledging limitations such as dependence on initial static reconstructions and multi-view inputs for now.
Abstract
Dynamic scene reconstruction poses a persistent challenge in 3D vision. Deformable 3D Gaussian Splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity. This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion. Scene motion is defined as the collective movement of all Gaussian points, and for compactness, existing approaches commonly adopt implicit neural fields or sparse control points. However, these methods predominantly rely on gradient-based optimization for all motion information. Due to the high degree of freedom, they struggle to converge on real-world datasets exhibiting complex motion. To preserve the compactness of motion representation and address convergence challenges, this paper proposes heterogeneous 3D control points, termed \textbf{H3D control points}, whose attributes are obtained using a hybrid strategy combining optical flow back-projection and gradient-based methods. This design decouples directly observable motion components from those that are geometrically occluded. Specifically, components of 3D motion that project onto the image plane are directly acquired via optical flow back projection, while unobservable portions are refined through gradient-based optimization. Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques. Remarkably, our method converges within just 100 iterations and achieves a per-frame processing speed of 2 seconds on a single NVIDIA RTX 4070 GPU.
