Table of Contents
Fetching ...

GA-GS: Generation-Assisted Gaussian Splatting for Static Scene Reconstruction

Yedong Shen, Shiqi Zhang, Sha Zhang, Yifan Duan, Xinran Zhang, Wenhao Yu, Lu Zhang, Jiajun Deng, Yanyong Zhang

Abstract

Reconstructing static 3D scene from monocular video with dynamic objects is important for numerous applications such as virtual reality and autonomous driving. Current approaches typically rely on background for static scene reconstruction, limiting the ability to recover regions occluded by dynamic objects. In this paper, we propose GA-GS, a Generation-Assisted Gaussian Splatting method for Static Scene Reconstruction. The key innovation of our work lies in leveraging generation to assist in reconstructing occluded regions. We employ a motion-aware module to segment and remove dynamic regions, and thenuse a diffusion model to inpaint the occluded areas, providing pseudo-ground-truth supervision. To balance contributions from real background and generated region, we introduce a learnable authenticity scalar for each Gaussian primitive, which dynamically modulates opacity during splatting for authenticity-aware rendering and supervision. Since no existing dataset provides ground-truth static scene of video with dynamic objects, we construct a dataset named Trajectory-Match, using a fixed-path robot to record each scene with/without dynamic objects, enabling quantitative evaluation in reconstruction of occluded regions. Extensive experiments on both the DAVIS and our dataset show that GA-GS achieves state-of-the-art performance in static scene reconstruction, especially in challenging scenarios with large-scale, persistent occlusions.

GA-GS: Generation-Assisted Gaussian Splatting for Static Scene Reconstruction

Abstract

Reconstructing static 3D scene from monocular video with dynamic objects is important for numerous applications such as virtual reality and autonomous driving. Current approaches typically rely on background for static scene reconstruction, limiting the ability to recover regions occluded by dynamic objects. In this paper, we propose GA-GS, a Generation-Assisted Gaussian Splatting method for Static Scene Reconstruction. The key innovation of our work lies in leveraging generation to assist in reconstructing occluded regions. We employ a motion-aware module to segment and remove dynamic regions, and thenuse a diffusion model to inpaint the occluded areas, providing pseudo-ground-truth supervision. To balance contributions from real background and generated region, we introduce a learnable authenticity scalar for each Gaussian primitive, which dynamically modulates opacity during splatting for authenticity-aware rendering and supervision. Since no existing dataset provides ground-truth static scene of video with dynamic objects, we construct a dataset named Trajectory-Match, using a fixed-path robot to record each scene with/without dynamic objects, enabling quantitative evaluation in reconstruction of occluded regions. Extensive experiments on both the DAVIS and our dataset show that GA-GS achieves state-of-the-art performance in static scene reconstruction, especially in challenging scenarios with large-scale, persistent occlusions.

Paper Structure

This paper contains 16 sections, 14 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison between the previous pipeline (a) and our proposed GA-GS (b). Previous methods supervise 3D Gaussian primitives solely based on background regions after dynamic object removal. In contrast, our GA-GS leverages a diffusion model to generate occluded content for auxiliary supervision and introduces authenticity-driven rendering to balance real and generated information.
  • Figure 2: An overview of our GA-GS pipeline. We use VGGT wang2025vggt to obtain accurate camera poses and per-pixel 3D positions, Then we employ a motion-aware SAM-based module to segment moving regions, and use a diffusion model to inpaint occlusions, providing pseudo-ground-truth supervision. In the opacity blending stage, the parameter $\theta$ is used to control the opacity of each Gaussian primitive, and the image space mask is applied to constrain the final loss.
  • Figure 3: Visualization on the DAVIS dataset. Since ground truth for the static background is unavailable, the first row only shows the input containing dynamic objects. Compared to the baselines, our method achieves better visual results in both background reconstruction and occlusion region recovery.
  • Figure 4: Data acquisition process of the Trajectory-Match dataset. (a) We employ a robot-mounted camera platform to ensure precise and repeatable camera trajectories. (b) For each scene, we first capture a dynamic sequence containing moving objects such as pedestrians or vehicles. (c) We then record a corresponding static sequence along the same trajectory after removing dynamic elements, which serves as ground-truth background. This paired acquisition setup enables direct quantitative evaluation of reconstruction performance in occluded regions.
  • Figure 5: Visualizations on Trajectory-Match Dataset. The second row presents the ground truth of recorded static scene, serving as a reference for comparing the reconstructions of GA-GS and the baseline in the third and fourth rows.