Swift4D:Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene
Jiahao Wu, Rui Peng, Zhiyan Wang, Lu Xiao, Luyang Tang, Jinbo Yan, Kaiqiang Xiong, Ronggang Wang
TL;DR
Swift4D tackles dynamic scene novel view synthesis by dividing Gaussian splats into dynamic and static components and applying temporal modeling only to the dynamic subset. A compact 4DHash-based spatio-temporal encoder paired with a multi-head deformation decoder models deformations of dynamic Gaussians, while temporal pruning removes floaters and mitigates coupling between canonical and deformation spaces. The method achieves state-of-the-art rendering quality with significantly reduced training time (often minutes) and storage (as low as $30$ MB) on real-world datasets, demonstrating fast convergence and practicality for dynamic scenes. The approach offers a plug-and-play module for existing dynamic methods and emphasizes efficient allocation of compute to genuinely dynamic regions, enabling scalable 4D reconstruction.
Abstract
Novel view synthesis has long been a practical but challenging task, although the introduction of numerous methods to solve this problem, even combining advanced representations like 3D Gaussian Splatting, they still struggle to recover high-quality results and often consume too much storage memory and training time. In this paper we propose Swift4D, a divide-and-conquer 3D Gaussian Splatting method that can handle static and dynamic primitives separately, achieving a good trade-off between rendering quality and efficiency, motivated by the fact that most of the scene is the static primitive and does not require additional dynamic properties. Concretely, we focus on modeling dynamic transformations only for the dynamic primitives which benefits both efficiency and quality. We first employ a learnable decomposition strategy to separate the primitives, which relies on an additional parameter to classify primitives as static or dynamic. For the dynamic primitives, we employ a compact multi-resolution 4D Hash mapper to transform these primitives from canonical space into deformation space at each timestamp, and then mix the static and dynamic primitives to produce the final output. This divide-and-conquer method facilitates efficient training and reduces storage redundancy. Our method not only achieves state-of-the-art rendering quality while being 20X faster in training than previous SOTA methods with a minimum storage requirement of only 30MB on real-world datasets. Code is available at https://github.com/WuJH2001/swift4d.
