STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering
Zehao Li, Hao Jiang, Yujun Cai, Jianing Chen, Baolong Bi, Shuqin Gao, Honglong Zhao, Yiwei Wang, Tianlu Mao, Zhaoqi Wang
TL;DR
This work identifies initialization-induced spatio-temporal incoherence as a key bottleneck in dynamic scene reconstruction with 3D Gaussian Splatting. It introduces STDR, a plug-and-play module that explicitly decouples spatial and temporal patterns using per-Gaussian spatio-temporal masks, a separated deformation field, and spatio-temporal consistency regularization. By learning temporal activations and factorizing motion from geometry, STDR yields clearer temporal alignment and robust dynamic representations, improving PSNR, SSIM, and LPIPS across synthetic and real benchmarks. The approach enhances both reconstruction quality and temporal stability, with practical impact on real-time rendering and robust dynamic scene understanding. STDR remains compatible with existing 3DGS pipelines and demonstrates strong generalization across diverse motion, topology changes, and real-world conditions.
Abstract
Although dynamic scene reconstruction has long been a fundamental challenge in 3D vision, the recent emergence of 3D Gaussian Splatting (3DGS) offers a promising direction by enabling high-quality, real-time rendering through explicit Gaussian primitives. However, existing 3DGS-based methods for dynamic reconstruction often suffer from \textit{spatio-temporal incoherence} during initialization, where canonical Gaussians are constructed by aggregating observations from multiple frames without temporal distinction. This results in spatio-temporally entangled representations, making it difficult to model dynamic motion accurately. To overcome this limitation, we propose \textbf{STDR} (Spatio-Temporal Decoupling for Real-time rendering), a plug-and-play module that learns spatio-temporal probability distributions for each Gaussian. STDR introduces a spatio-temporal mask, a separated deformation field, and a consistency regularization to jointly disentangle spatial and temporal patterns. Extensive experiments demonstrate that incorporating our module into existing 3DGS-based dynamic scene reconstruction frameworks leads to notable improvements in both reconstruction quality and spatio-temporal consistency across synthetic and real-world benchmarks.
