SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Perez-Pellitero
TL;DR
SWinGS addresses dynamic scene rendering by extending 3D Gaussian Splatting with temporally-local canonical spaces organized into adaptive sliding windows. Each window trains an independent dynamic 3DGS model with a temporally-local deformation field learned by a tunable MLP, guided by blending parameters that separate static and dynamic components. An adaptive windowing strategy based on optical-flow magnitudes and a self-supervised temporal consistency fine-tuning stage jointly suppress inter-window flickering and preserve visual quality. The approach yields high-quality, temporally coherent renderings at real-time rates on challenging multi-view datasets, outperforming several State-of-the-Art methods in PSNR, SSIM, and perceptual video quality metrics. Overall, SWinGS provides a scalable, robust framework for dynamic NeRF-like rendering using dynamic 3D Gaussians and differentiable rasterization.
Abstract
Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates. However, it is limited to static scenes. In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer.
