Table of Contents
Fetching ...

LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates

Minkwan Kim, Seungmin Lee, Junho Kim, Young Min Kim

TL;DR

LTGS addresses long-term scene evolution under sparse captures by updating an initial Gaussian splatting reconstruction with object-level change templates. It combines change detection from semantic and photometric cues, object-template extraction, and per-object pose-aware optimization to fuse time-varying objects with a static background. Experimental results show superior reconstruction quality and efficient updates against NeRF and Gaussian-splat baselines on both synthetic and real-world datasets, especially for abrupt object insertions, removals, or relocations. This object-centric approach promises scalable, reusable priors for digital twins, robotics, and location-based services, with future work extending to non-rigid changes and lighting variations.

Abstract

Recent advances in novel-view synthesis can create the photo-realistic visualization of real-world environments from conventional camera captures. However, acquiring everyday environments from casual captures faces challenges due to frequent scene changes, which require dense observations both spatially and temporally. We propose long-term Gaussian scene chronology from sparse-view updates, coined LTGS, an efficient scene representation that can embrace everyday changes from highly under-constrained casual captures. Given an incomplete and unstructured Gaussian splatting representation obtained from an initial set of input images, we robustly model the long-term chronology of the scene despite abrupt movements and subtle environmental variations. We construct objects as template Gaussians, which serve as structural, reusable priors for shared object tracks. Then, the object templates undergo a further refinement pipeline that modulates the priors to adapt to temporally varying environments based on few-shot observations. Once trained, our framework is generalizable across multiple time steps through simple transformations, significantly enhancing the scalability for a temporal evolution of 3D environments. As existing datasets do not explicitly represent the long-term real-world changes with a sparse capture setup, we collect real-world datasets to evaluate the practicality of our pipeline. Experiments demonstrate that our framework achieves superior reconstruction quality compared to other baselines while enabling fast and light-weight updates.

LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates

TL;DR

LTGS addresses long-term scene evolution under sparse captures by updating an initial Gaussian splatting reconstruction with object-level change templates. It combines change detection from semantic and photometric cues, object-template extraction, and per-object pose-aware optimization to fuse time-varying objects with a static background. Experimental results show superior reconstruction quality and efficient updates against NeRF and Gaussian-splat baselines on both synthetic and real-world datasets, especially for abrupt object insertions, removals, or relocations. This object-centric approach promises scalable, reusable priors for digital twins, robotics, and location-based services, with future work extending to non-rigid changes and lighting variations.

Abstract

Recent advances in novel-view synthesis can create the photo-realistic visualization of real-world environments from conventional camera captures. However, acquiring everyday environments from casual captures faces challenges due to frequent scene changes, which require dense observations both spatially and temporally. We propose long-term Gaussian scene chronology from sparse-view updates, coined LTGS, an efficient scene representation that can embrace everyday changes from highly under-constrained casual captures. Given an incomplete and unstructured Gaussian splatting representation obtained from an initial set of input images, we robustly model the long-term chronology of the scene despite abrupt movements and subtle environmental variations. We construct objects as template Gaussians, which serve as structural, reusable priors for shared object tracks. Then, the object templates undergo a further refinement pipeline that modulates the priors to adapt to temporally varying environments based on few-shot observations. Once trained, our framework is generalizable across multiple time steps through simple transformations, significantly enhancing the scalability for a temporal evolution of 3D environments. As existing datasets do not explicitly represent the long-term real-world changes with a sparse capture setup, we collect real-world datasets to evaluate the practicality of our pipeline. Experiments demonstrate that our framework achieves superior reconstruction quality compared to other baselines while enabling fast and light-weight updates.

Paper Structure

This paper contains 23 sections, 9 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: We introduce LTGS to efficiently update the Gaussian reconstruction of the initial environments. Given the spatio-temporally sparse post-change images, our framework tracks object-level changes in 3D and enables modeling scenes with long-term changes.
  • Figure 2: Method overview. We propose an integrated pipeline to update an initial reconstruction given the collection of post-change captures. Our pipeline first finds the camera locations of the input capture and compares them against the rendering of the initial reconstruction in the same view to detect object-level changes. We aggregate the detected objects in change from multiple viewpoints and different time stamps to create 3D Gaussian templates. We finally update the temporal scenes by compositing the object-level templates at their respective states with the background.
  • Figure 3: Qualitative comparisons of our method. We illustrate the results of our method using the CL-NeRF dataset and our dataset.
  • Figure 4: Visual comparison of ablation study.
  • Figure 5: Object template visualization. We sampled several captures from the initial state and post-change captures and corresponding object-level Gaussian templates.
  • ...and 3 more figures