SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting

Yonghan Lee; Tsung-Wei Huang; Shiv Gehlot; Jaehoon Choi; Guan-Ming Su; Dinesh Manocha

SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting

Yonghan Lee, Tsung-Wei Huang, Shiv Gehlot, Jaehoon Choi, Guan-Ming Su, Dinesh Manocha

TL;DR

SyncTrack4D introduces a general framework for reconstructing dynamic scenes from unsynchronized multi-view videos by leveraging dense 4D tracks for cross-video synchronization and 4D Gaussian Splatting. It combines Fused Gromov–Wasserstein track matching with dynamic Time Warping and a motion-spline scaffold to jointly align temporal offsets and optimize a unified 4DGS representation. The approach achieves sub-frame synchronization (average ~$0.26$ frames) and high-fidelity 4D reconstructions (PSNR > $26$) on real-world datasets, without relying on predefined scene templates. This work broadens 4DGaussianSplatting to unsynchronized multi-view settings, enabling robust dynamic scene capture in unconstrained environments.

Abstract

Modeling dynamic 3D scenes is challenging due to their high-dimensional nature, which requires aggregating information from multiple views to reconstruct time-evolving 3D geometry and motion. We present a novel multi-video 4D Gaussian Splatting (4DGS) approach designed to handle real-world, unsynchronized video sets. Our approach, SyncTrack4D, directly leverages dense 4D track representation of dynamic scene parts as cues for simultaneous cross-video synchronization and 4DGS reconstruction. We first compute dense per-video 4D feature tracks and cross-video track correspondences by Fused Gromov-Wasserstein optimal transport approach. Next, we perform global frame-level temporal alignment to maximize overlapping motion of matched 4D tracks. Finally, we achieve sub-frame synchronization through our multi-video 4D Gaussian splatting built upon a motion-spline scaffold representation. The final output is a synchronized 4DGS representation with dense, explicit 3D trajectories, and temporal offsets for each video. We evaluate our approach on the Panoptic Studio and SyncNeRF Blender, demonstrating sub-frame synchronization accuracy with an average temporal error below 0.26 frames, and high-fidelity 4D reconstruction reaching 26.3 PSNR scores on the Panoptic Studio dataset. To the best of our knowledge, our work is the first general 4D Gaussian Splatting approach for unsynchronized video sets, without assuming the existence of predefined scene objects or prior models.

SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting

TL;DR

Abstract

SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)