Dynamic Gaussian Scene Reconstruction from Unsynchronized Videos
Zhixin Xu, Hengyu Zhou, Yuan Liu, Wenhan Xue, Hao Pan, Wenping Wang, Bin Wang
TL;DR
This work tackles the problem of reconstructing dynamic scenes with 4D Gaussian Splatting (4DGS) from unsynchronized multi-view videos by introducing a coarse-to-fine temporal alignment module. The method jointly estimates per-camera time shifts, combining a coarse frame-level search using LoFTR and RANSAC with a learnable sub-frame refinement, and is designed as a plug-in for existing 4DGS frameworks. It demonstrates significant improvements over baseline methods on challenging DyNeRF-based data, including both neural deformation and direct 4D representations, while maintaining robustness to substantial time misalignment. The approach expands practical 4D dynamic capture by enabling high-quality reconstruction with more flexible, lower-cost camera setups, and provides strong ablations showing the complementary roles of coarse and fine temporal alignment.
Abstract
Multi-view video reconstruction plays a vital role in computer vision, enabling applications in film production, virtual reality, and motion analysis. While recent advances such as 4D Gaussian Splatting (4DGS) have demonstrated impressive capabilities in dynamic scene reconstruction, they typically rely on the assumption that input video streams are temporally synchronized. However, in real-world scenarios, this assumption often fails due to factors like camera trigger delays or independent recording setups, leading to temporal misalignment across views and reduced reconstruction quality. To address this challenge, a novel temporal alignment strategy is proposed for high-quality 4DGS reconstruction from unsynchronized multi-view videos. Our method features a coarse-to-fine alignment module that estimates and compensates for each camera's time shift. The method first determines a coarse, frame-level offset and then refines it to achieve sub-frame accuracy. This strategy can be integrated as a readily integrable module into existing 4DGS frameworks, enhancing their robustness when handling asynchronous data. Experiments show that our approach effectively processes temporally misaligned videos and significantly enhances baseline methods.
