Table of Contents
Fetching ...

Cross-Temporal 3D Gaussian Splatting for Sparse-View Guided Scene Update

Zeyuan An, Yanghang Xiao, Zhiying Leng, Frederick W. B. Li, Xiaohui Liang

TL;DR

Cross-Temporal 3DGS tackles updating 3D scene reconstructions across time from sparse images by aligning across timestamps, estimating confidence on priors, and progressively integrating historical information. The method combines cross-temporal camera alignment, interference-based confidence initialization, and progressive optimization to update or recover scenes with reduced data. Experiments on synthetic and real datasets show substantial gains in fidelity, structural consistency, and efficiency over baselines. The approach enables non-continuous, sparse-view scene versioning and cross-temporal digital twins with improved data-efficiency. It offers a practical, scalable solution for long-term spatial documentation and change monitoring.

Abstract

Maintaining consistent 3D scene representations over time is a significant challenge in computer vision. Updating 3D scenes from sparse-view observations is crucial for various real-world applications, including urban planning, disaster assessment, and historical site preservation, where dense scans are often unavailable or impractical. In this paper, we propose Cross-Temporal 3D Gaussian Splatting (Cross-Temporal 3DGS), a novel framework for efficiently reconstructing and updating 3D scenes across different time periods, using sparse images and previously captured scene priors. Our approach comprises three stages: 1) Cross-temporal camera alignment for estimating and aligning camera poses across different timestamps; 2) Interference-based confidence initialization to identify unchanged regions between timestamps, thereby guiding updates; and 3) Progressive cross-temporal optimization, which iteratively integrates historical prior information into the 3D scene to enhance reconstruction quality. Our method supports non-continuous capture, enabling not only updates using new sparse views to refine existing scenes, but also recovering past scenes from limited data with the help of current captures. Furthermore, we demonstrate the potential of this approach to achieve temporal changes using only sparse images, which can later be reconstructed into detailed 3D representations as needed. Experimental results show significant improvements over baseline methods in reconstruction quality and data efficiency, making this approach a promising solution for scene versioning, cross-temporal digital twins, and long-term spatial documentation.

Cross-Temporal 3D Gaussian Splatting for Sparse-View Guided Scene Update

TL;DR

Cross-Temporal 3DGS tackles updating 3D scene reconstructions across time from sparse images by aligning across timestamps, estimating confidence on priors, and progressively integrating historical information. The method combines cross-temporal camera alignment, interference-based confidence initialization, and progressive optimization to update or recover scenes with reduced data. Experiments on synthetic and real datasets show substantial gains in fidelity, structural consistency, and efficiency over baselines. The approach enables non-continuous, sparse-view scene versioning and cross-temporal digital twins with improved data-efficiency. It offers a practical, scalable solution for long-term spatial documentation and change monitoring.

Abstract

Maintaining consistent 3D scene representations over time is a significant challenge in computer vision. Updating 3D scenes from sparse-view observations is crucial for various real-world applications, including urban planning, disaster assessment, and historical site preservation, where dense scans are often unavailable or impractical. In this paper, we propose Cross-Temporal 3D Gaussian Splatting (Cross-Temporal 3DGS), a novel framework for efficiently reconstructing and updating 3D scenes across different time periods, using sparse images and previously captured scene priors. Our approach comprises three stages: 1) Cross-temporal camera alignment for estimating and aligning camera poses across different timestamps; 2) Interference-based confidence initialization to identify unchanged regions between timestamps, thereby guiding updates; and 3) Progressive cross-temporal optimization, which iteratively integrates historical prior information into the 3D scene to enhance reconstruction quality. Our method supports non-continuous capture, enabling not only updates using new sparse views to refine existing scenes, but also recovering past scenes from limited data with the help of current captures. Furthermore, we demonstrate the potential of this approach to achieve temporal changes using only sparse images, which can later be reconstructed into detailed 3D representations as needed. Experimental results show significant improvements over baseline methods in reconstruction quality and data efficiency, making this approach a promising solution for scene versioning, cross-temporal digital twins, and long-term spatial documentation.

Paper Structure

This paper contains 15 sections, 3 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Scene update via Cross-temporal 3DGS. Given sparse-view inputs at $t_n$, our proposed Cross-temporal 3DGS updates the 3D scene from a well-observed timestamp $t_0$ to the sparse observed timestamp $t_n$.
  • Figure 2: Overall pipeline of Cross-Temporal 3DGS.
  • Figure 3: Cross-Temporal Camera Alignment. This process involves estimating and aligning camera poses from different timestamps ($t_0$ and $t_n$) by registering dense point clouds, ensuring geometric consistency for effective scene updates.
  • Figure 4: Point cloud alignment of $t_0$ and $t_n$. Initial alignment results (left) are further refined using ICP to achieve a more precise transformation (right), ensuring accurate registration of scene structures and camera poses.
  • Figure 5: Qualitative Comparison. We present the performance of four different methods (InstantSplat, Baseline, GaussianEditor, Ours) across three scenes in the dataset (Airpods, Tower). For each scene, we show one training view and two testing views to illustrate the effectiveness of each approach. The changed parts were highlighted in boxes.