Table of Contents
Fetching ...

SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting

Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Perez-Pellitero

TL;DR

SWinGS addresses dynamic scene rendering by extending 3D Gaussian Splatting with temporally-local canonical spaces organized into adaptive sliding windows. Each window trains an independent dynamic 3DGS model with a temporally-local deformation field learned by a tunable MLP, guided by blending parameters that separate static and dynamic components. An adaptive windowing strategy based on optical-flow magnitudes and a self-supervised temporal consistency fine-tuning stage jointly suppress inter-window flickering and preserve visual quality. The approach yields high-quality, temporally coherent renderings at real-time rates on challenging multi-view datasets, outperforming several State-of-the-Art methods in PSNR, SSIM, and perceptual video quality metrics. Overall, SWinGS provides a scalable, robust framework for dynamic NeRF-like rendering using dynamic 3D Gaussians and differentiable rasterization.

Abstract

Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates. However, it is limited to static scenes. In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer.

SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting

TL;DR

SWinGS addresses dynamic scene rendering by extending 3D Gaussian Splatting with temporally-local canonical spaces organized into adaptive sliding windows. Each window trains an independent dynamic 3DGS model with a temporally-local deformation field learned by a tunable MLP, guided by blending parameters that separate static and dynamic components. An adaptive windowing strategy based on optical-flow magnitudes and a self-supervised temporal consistency fine-tuning stage jointly suppress inter-window flickering and preserve visual quality. The approach yields high-quality, temporally coherent renderings at real-time rates on challenging multi-view datasets, outperforming several State-of-the-Art methods in PSNR, SSIM, and perceptual video quality metrics. Overall, SWinGS provides a scalable, robust framework for dynamic NeRF-like rendering using dynamic 3D Gaussians and differentiable rasterization.

Abstract

Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates. However, it is limited to static scenes. In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer.
Paper Structure (14 sections, 8 equations, 9 figures, 6 tables)

This paper contains 14 sections, 8 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Left: SWinGS achieves sharper dynamic 3D scene reconstruction in part thanks to a sliding window canonical space that reduces the complexity of the 3D motion estimation. Right: Our dynamic real-time viewer allows users to explore the scene.
  • Figure 2: Method. First, the sequence is partitioned into sliding windows based on optical flow. Second, a dynamic 3DGS model is trained per window, where tunable MLPs model the deformations. Blending parameters $\boldsymbol{\alpha}$ weigh the MLP's parameters to focus on dynamic parts. Finally, each model is fine-tuned, enforcing inter-window temporal consistency with consistency loss on sampled views for overlapping frames.
  • Figure 2: Quantitative results on Technicolor dataset Sabater2017 at full resolution. Best and second best results are highlighted.
  • Figure 3: Dynamic MLPs with tunable parameters $\boldsymbol{\alpha}$ weigh the parameters of the MLP for each Gaussian. We show renders from two scenes, left: cook spinachli2022a and right: TrainSabater2017. Shown from left-to-right: image render, tunable $\boldsymbol{\alpha}$ parameters, and normalized MLP displacements $\Delta \boldsymbol{x}$. Note, $\boldsymbol{\alpha}$ highlights the scene's dynamic regions.
  • Figure 4: Comparison in performance consistency for Ours and Dynamic3DG in consecutive frames.
  • ...and 4 more figures