Table of Contents
Fetching ...

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

Jingyu Lin, Jiaqi Gu, Lubin Fan, Bojian Wu, Yujing Lou, Renjie Chen, Ligang Liu, Jieping Ye

TL;DR

HybridGS introduces a novel hybrid representation that decouples transient objects from static scene content by using per-image 2D Gaussians for transients and multi-view-consistent 3D Gaussians for statics. A multi-view regulated supervision scheme guides 3D Gaussians across co-visible regions, complemented by a three-stage training strategy that alternates and then jointly optimizes both components. The approach yields state-of-the-art novel-view synthesis on challenging indoor/outdoor datasets with distractors, while reducing storage and computation compared to traditional 3DGS. This work lays a robust, efficient foundation for handling transient content in casually captured scenes without reliance on semantic priors, with potential extensions to illumination variability and appearance modeling.

Abstract

Generating high-quality novel view renderings of 3D Gaussian Splatting (3DGS) in scenes featuring transient objects is challenging. We propose a novel hybrid representation, termed as HybridGS, using 2D Gaussians for transient objects per image and maintaining traditional 3D Gaussians for the whole static scenes. Note that, the 3DGS itself is better suited for modeling static scenes that assume multi-view consistency, but the transient objects appear occasionally and do not adhere to the assumption, thus we model them as planar objects from a single view, represented with 2D Gaussians. Our novel representation decomposes the scene from the perspective of fundamental viewpoint consistency, making it more reasonable. Additionally, we present a novel multi-view regulated supervision method for 3DGS that leverages information from co-visible regions, further enhancing the distinctions between the transients and statics. Then, we propose a straightforward yet effective multi-stage training strategy to ensure robust training and high-quality view synthesis across various settings. Experiments on benchmark datasets show our state-of-the-art performance of novel view synthesis in both indoor and outdoor scenes, even in the presence of distracting elements.

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

TL;DR

HybridGS introduces a novel hybrid representation that decouples transient objects from static scene content by using per-image 2D Gaussians for transients and multi-view-consistent 3D Gaussians for statics. A multi-view regulated supervision scheme guides 3D Gaussians across co-visible regions, complemented by a three-stage training strategy that alternates and then jointly optimizes both components. The approach yields state-of-the-art novel-view synthesis on challenging indoor/outdoor datasets with distractors, while reducing storage and computation compared to traditional 3DGS. This work lays a robust, efficient foundation for handling transient content in casually captured scenes without reliance on semantic priors, with potential extensions to illumination variability and appearance modeling.

Abstract

Generating high-quality novel view renderings of 3D Gaussian Splatting (3DGS) in scenes featuring transient objects is challenging. We propose a novel hybrid representation, termed as HybridGS, using 2D Gaussians for transient objects per image and maintaining traditional 3D Gaussians for the whole static scenes. Note that, the 3DGS itself is better suited for modeling static scenes that assume multi-view consistency, but the transient objects appear occasionally and do not adhere to the assumption, thus we model them as planar objects from a single view, represented with 2D Gaussians. Our novel representation decomposes the scene from the perspective of fundamental viewpoint consistency, making it more reasonable. Additionally, we present a novel multi-view regulated supervision method for 3DGS that leverages information from co-visible regions, further enhancing the distinctions between the transients and statics. Then, we propose a straightforward yet effective multi-stage training strategy to ensure robust training and high-quality view synthesis across various settings. Experiments on benchmark datasets show our state-of-the-art performance of novel view synthesis in both indoor and outdoor scenes, even in the presence of distracting elements.

Paper Structure

This paper contains 30 sections, 12 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: HybridGS is the first hybrid representation that combines multi-view consistent 3D Gaussians and single-view independent 2D Gaussians, which is used to decouple the transients and statics presented in the scene. Our results demonstrate reasonable decompositions.
  • Figure 2: Overview. Given a casually captured image sequence, we decompose the whole scene into 2D Gaussians for transient objects and 3D Gaussians for static scenes. To warm up, we start by training a basic 3DGS to capture static elements. This is followed by iterative training of 2D and 3D Gaussians, where our transients and statics are combined using an $\alpha$-blending strategy with masks to produce the final renderings. The masks provide guidance for 3D Gaussians in the iterative training stage. During the joint-training, both 2D and 3D Gaussians are trained to further optimize the decomposition results.
  • Figure 3: Visualization of novel view synthesis results on the testing set of NeRF On-the-go dataset. Our method demonstrates superior results by effectively reducing artifacts and providing clearer boundaries. This results in a cleaner statics compared to other methods, showcasing enhanced visual quality and precision in novel views.
  • Figure 4: Visualization of novel view synthesis results on RobustNeRF dataset.
  • Figure 5: Comparison of transient masks on NeRF On-the-go dataset.
  • ...and 6 more figures