Table of Contents
Fetching ...

DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction

Kai Xu, Tze Ho Elden Tse, Jizong Peng, Angela Yao

TL;DR

<3-5 sentence high-level summary> DAS3R tackles static-background reconstruction from dynamic, unposed videos by predicting dynamic masks from image pairs and integrating them into a dynamics-aware Gaussian Splatting framework. It operates without camera intrinsics or depth information, leveraging global alignment and a staticness attribute within Gaussian rendering to separate static and dynamic content. The method achieves over 2 dB PSNR gains on DAVIS and Sintel and improves camera pose estimation under challenging dynamics. While robust, it can incur false positives in depth-variant regions, suggesting future refinements with more diverse data and optimization strategies.

Abstract

We propose a novel framework for scene decomposition and static background reconstruction from everyday videos. By integrating the trained motion masks and modeling the static scene as Gaussian splats with dynamics-aware optimization, our method achieves more accurate background reconstruction results than previous works. Our proposed method is termed DAS3R, an abbreviation for Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction. Compared to existing methods, DAS3R is more robust in complex motion scenarios, capable of handling videos where dynamic objects occupy a significant portion of the scene, and does not require camera pose inputs or point cloud data from SLAM-based methods. We compared DAS3R against recent distractor-free approaches on the DAVIS and Sintel datasets; DAS3R demonstrates enhanced performance and robustness with a margin of more than 2 dB in PSNR. The project's webpage can be accessed via \url{https://kai422.github.io/DAS3R/}

DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction

TL;DR

<3-5 sentence high-level summary> DAS3R tackles static-background reconstruction from dynamic, unposed videos by predicting dynamic masks from image pairs and integrating them into a dynamics-aware Gaussian Splatting framework. It operates without camera intrinsics or depth information, leveraging global alignment and a staticness attribute within Gaussian rendering to separate static and dynamic content. The method achieves over 2 dB PSNR gains on DAVIS and Sintel and improves camera pose estimation under challenging dynamics. While robust, it can incur false positives in depth-variant regions, suggesting future refinements with more diverse data and optimization strategies.

Abstract

We propose a novel framework for scene decomposition and static background reconstruction from everyday videos. By integrating the trained motion masks and modeling the static scene as Gaussian splats with dynamics-aware optimization, our method achieves more accurate background reconstruction results than previous works. Our proposed method is termed DAS3R, an abbreviation for Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction. Compared to existing methods, DAS3R is more robust in complex motion scenarios, capable of handling videos where dynamic objects occupy a significant portion of the scene, and does not require camera pose inputs or point cloud data from SLAM-based methods. We compared DAS3R against recent distractor-free approaches on the DAVIS and Sintel datasets; DAS3R demonstrates enhanced performance and robustness with a margin of more than 2 dB in PSNR. The project's webpage can be accessed via \url{https://kai422.github.io/DAS3R/}
Paper Structure (17 sections, 9 equations, 5 figures, 5 tables)

This paper contains 17 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview: DAS3R featuring reconstructing static scene from unposed videos where dynamic objects occupy a significant portion of the scene. We predict dynamic mask directly from deep network with image pair as input. The predicted dynamic masks are then used for dynamic-aware Gaussian splatting training. In the figure we show an example from Sintel dataset. Compares to SpotLessSplats sabour2024spotlesssplats, DAS3R can reconstruct clean background while SpotLessSplats fails to remove the dynamic object.
  • Figure 2: Dynamic Mask Comparison on DAVIS dataset.
  • Figure 3: Dynamic Mask Comparison on Sintel dataset.
  • Figure 4: Qualitative comparison on DAVIS dataset. DAS3R achieves best rendering quality and is able to correctly detect and remove dynamic objects with their shadow and reflection. SpotLessSplats removes static content (the hurdles) in horsejump-high (4th row of DAVIS).
  • Figure 5: Qualitative comparison on Sintel dataset. DAS3R is robust to large dynamic objects while other methods fail to remove the dynamic objects and even fail to reconstruct the overall scene .