SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization
Lifan Wu, Ruijie Zhu, Yubo Ai, Tianzhu Zhang
TL;DR
SkeletonGaussian tackles editable 4D generation from monocular video by introducing a skeleton-driven, hierarchical deformation model for Gaussian splatting. The method splits motion into rigid skeleton-driven deformation via linear blend skinning and fine non-rigid refinement with a HexPlane+MLP, enabling direct pose-based editing and better interpretability. It uses a static 3D Gaussian initial object, followed by rigid and non-rigid stages, trained with MV-SDS, reconstruction, and mask losses, achieving higher quality than prior methods on Consistent4D. The approach integrates smoothly with animation pipelines and enables real-time editing, offering a practical pathway for controllable 4D motion synthesis.
Abstract
4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation field, which limits direct control and editability. To address this issue, we propose SkeletonGaussian, a novel framework for generating editable dynamic 3D Gaussians from monocular video input. Our approach introduces a hierarchical articulated representation that decomposes motion into sparse rigid motion explicitly driven by a skeleton and fine-grained non-rigid motion. Concretely, we extract a robust skeleton and drive rigid motion via linear blend skinning, followed by a hexplane-based refinement for non-rigid deformations, enhancing interpretability and editability. Experimental results demonstrate that SkeletonGaussian surpasses existing methods in generation quality while enabling intuitive motion editing, establishing a new paradigm for editable 4D generation. Project page: https://wusar.github.io/projects/skeletongaussian/
