Table of Contents
Fetching ...

Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction

Xinyu Zhang, Haonan Chang, Yuhan Liu, Abdeslam Boularias

TL;DR

This work tackles the lack of explicit controllability in dynamic scene reconstruction by introducing Motion Blender Gaussian Splatting (MBGS), which uses sparse motion graphs (kinematic trees and deformable graphs) to explicitly drive time-varying 3D Gaussians via dual quaternion skinning. The per-graph link motions are blended into Gaussian motions through a learnable weight painting function, enabling end-to-end optimization from video with differentiable rendering. MBGS achieves state-of-the-art performance on the iPhone dataset and competitive results on HyperNeRF, while enabling novel pose animation, robot demonstration synthesis, and visual planning through explicit graph manipulation. The approach improves interpretability and manipulability of dynamic scene reconstructions, with practical implications for robotics and data-efficient planning, though it also highlights limitations in surface fidelity and motion under strong lighting or fast dynamics.

Abstract

Gaussian splatting has emerged as a powerful tool for high-fidelity reconstruction of dynamic scenes. However, existing methods primarily rely on implicit motion representations, such as encoding motions into neural networks or per-Gaussian parameters, which makes it difficult to further manipulate the reconstructed motions. This lack of explicit controllability limits existing methods to replaying recorded motions only, which hinders a wider application in robotics. To address this, we propose Motion Blender Gaussian Splatting (MBGS), a novel framework that uses motion graphs as an explicit and sparse motion representation. The motion of a graph's links is propagated to individual Gaussians via dual quaternion skinning, with learnable weight painting functions that determine the influence of each link. The motion graphs and 3D Gaussians are jointly optimized from input videos via differentiable rendering. Experiments show that MBGS achieves state-of-the-art performance on the highly challenging iPhone dataset while being competitive on HyperNeRF. We demonstrate the application potential of our method in animating novel object poses, synthesizing real robot demonstrations, and predicting robot actions through visual planning. The source code, models, video demonstrations can be found at http://mlzxy.github.io/motion-blender-gs.

Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction

TL;DR

This work tackles the lack of explicit controllability in dynamic scene reconstruction by introducing Motion Blender Gaussian Splatting (MBGS), which uses sparse motion graphs (kinematic trees and deformable graphs) to explicitly drive time-varying 3D Gaussians via dual quaternion skinning. The per-graph link motions are blended into Gaussian motions through a learnable weight painting function, enabling end-to-end optimization from video with differentiable rendering. MBGS achieves state-of-the-art performance on the iPhone dataset and competitive results on HyperNeRF, while enabling novel pose animation, robot demonstration synthesis, and visual planning through explicit graph manipulation. The approach improves interpretability and manipulability of dynamic scene reconstructions, with practical implications for robotics and data-efficient planning, though it also highlights limitations in surface fidelity and motion under strong lighting or fast dynamics.

Abstract

Gaussian splatting has emerged as a powerful tool for high-fidelity reconstruction of dynamic scenes. However, existing methods primarily rely on implicit motion representations, such as encoding motions into neural networks or per-Gaussian parameters, which makes it difficult to further manipulate the reconstructed motions. This lack of explicit controllability limits existing methods to replaying recorded motions only, which hinders a wider application in robotics. To address this, we propose Motion Blender Gaussian Splatting (MBGS), a novel framework that uses motion graphs as an explicit and sparse motion representation. The motion of a graph's links is propagated to individual Gaussians via dual quaternion skinning, with learnable weight painting functions that determine the influence of each link. The motion graphs and 3D Gaussians are jointly optimized from input videos via differentiable rendering. Experiments show that MBGS achieves state-of-the-art performance on the highly challenging iPhone dataset while being competitive on HyperNeRF. We demonstrate the application potential of our method in animating novel object poses, synthesizing real robot demonstrations, and predicting robot actions through visual planning. The source code, models, video demonstrations can be found at http://mlzxy.github.io/motion-blender-gs.

Paper Structure

This paper contains 16 sections, 6 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Capabilities of Our Framework. Our method reconstructs and renders dynamic scenes into 3D Gaussians and motion graphs from input videos. The learned motion graphs for a hand and cat are shown with their corresponding rendered scenes (left). Our approach enables three key applications (right): ➊ Novel pose animation through motion graph editing, ➋ Robot demonstration synthesis by using robot kinematic chains as motion graphs, and ➌ Predicting robot actions by simulating graph movements to minimize the difference between rendered and goal images.
  • Figure 1: Novel view rendering on the highly challenging iPhone dataset wang2024shape. LPIPS more accurately reflects perceptual quality.
  • Figure 2: Motion Blender Gaussian Splatting. Our framework explicitly represents motion using sparse dynamic graphs. Static 3D Gaussians are associated with the graphs through learnable weight painting. Then, link-wise motions are propagated to the Gaussians through motion blending with dual quaternion skinning. We employ two motion graph types: kinematic trees, ideal for capturing articulated structures like human bodies, and deformable graphs, designed for modeling non-rigid deformations in soft objects. The parameters of the motion graph, weight painting functions, and 3D Gaussians are jointly optimized, end-to-end, via differentiable rendering.
  • Figure 2: HyperNerf park2021hypernerf. Our method performs competitively, closely matching SoTA in the key LPIPS metric.
  • Figure 3: Motion Graphs. A kinematic tree (left) uses time-independent link lengths $\ell$ and dynamic joint rotations $\mathbf{r}_t \in \operatorname{SO}(3)$. Link poses (shown as colored coordinate axes) in world coordinates are computed through forward kinematics. A deformable graph (right) employs free-form topology parameterized by joint positions $\{\mathbf{n}_{i,t}\}$ and has non-rigid link deformations. Rigid per-link poses are obtained relative to each Gaussian position $\mathbf{x}_0$ and look-at transformation as in Eq. \ref{['eq:project-point']} and Eq. \ref{['eq:lookat']}.
  • ...and 15 more figures