Table of Contents
Fetching ...

GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

Yichen Zhang, Zihan Wang, Jiali Han, Peilin Li, Jiaxun Zhang, Jianqiang Wang, Lei He, Keqiang Li

TL;DR

GS-Net is a generalizable, plug-and-play 3DGS module that densifies Gaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure representation and is the first plug-and-play 3DGS module with cross-scene generalization capabilities.

Abstract

3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based representations and volumetric rendering techniques, enabling real-time, high-quality rendering. However, 3DGS models typically overfit to single-scene training and are highly sensitive to the initialization of Gaussian ellipsoids, heuristically derived from Structure from Motion (SfM) point clouds, which limits both generalization and practicality. To address these limitations, we propose GS-Net, a generalizable, plug-and-play 3DGS module that densifies Gaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure representation. To the best of our knowledge, GS-Net is the first plug-and-play 3DGS module with cross-scene generalization capabilities. Additionally, we introduce the CARLA-NVS dataset, which incorporates additional camera viewpoints to thoroughly evaluate reconstruction and rendering quality. Extensive experiments demonstrate that applying GS-Net to 3DGS yields a PSNR improvement of 2.08 dB for conventional viewpoints and 1.86 dB for novel viewpoints, confirming the method's effectiveness and robustness.

GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

TL;DR

GS-Net is a generalizable, plug-and-play 3DGS module that densifies Gaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure representation and is the first plug-and-play 3DGS module with cross-scene generalization capabilities.

Abstract

3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based representations and volumetric rendering techniques, enabling real-time, high-quality rendering. However, 3DGS models typically overfit to single-scene training and are highly sensitive to the initialization of Gaussian ellipsoids, heuristically derived from Structure from Motion (SfM) point clouds, which limits both generalization and practicality. To address these limitations, we propose GS-Net, a generalizable, plug-and-play 3DGS module that densifies Gaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure representation. To the best of our knowledge, GS-Net is the first plug-and-play 3DGS module with cross-scene generalization capabilities. Additionally, we introduce the CARLA-NVS dataset, which incorporates additional camera viewpoints to thoroughly evaluate reconstruction and rendering quality. Extensive experiments demonstrate that applying GS-Net to 3DGS yields a PSNR improvement of 2.08 dB for conventional viewpoints and 1.86 dB for novel viewpoints, confirming the method's effectiveness and robustness.
Paper Structure (18 sections, 4 equations, 5 figures, 4 tables)

This paper contains 18 sections, 4 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The pipeline of GS-Net. We take sparse point clouds as input and output the predicted dense ellipsoid array. Each point of the input generates $T$ Gaussian ellipsoids, which allows for higher quality and denser representation of local scenes. The light colored arrows on Predicted Ellipsoid Array image indicate this generation.
  • Figure 2: Sample Examples of the CARLA-NVS Dataset. We showcase RGB, depth, semantic segmentation images, and LiDAR point clouds from two scenes under different viewpoints and weather conditions.
  • Figure 3: Sensor configuration on the vehicle in the CARLA-NVS dataset.
  • Figure 4: Rendering results on the CARLA-NVS datasets. GS-Net achieves significant improvements over 3DGS, especially in rendering texture-less surfaces and fine details like lane markings.
  • Figure 5: Visualization of Delta Learning Effects. (a) shows our sparse point cloud. (b) represents the output from learning absolute positions. (c) displays the output from learning delta positions, which adheres to the scene's geometric structure and achieves densification.