Table of Contents
Fetching ...

SAGS: Structure-Aware 3D Gaussian Splatting

Evangelos Ververas, Rolandos Alexandros Potamias, Jifei Song, Jiankang Deng, Stefanos Zafeiriou

TL;DR

We address the limitations of geometry-agnostic 3D Gaussian Splatting (3D-GS) in novel-view synthesis, which often yields floaters and depth errors. We propose Structure-Aware Gaussian Splatting (SAGS), a pipeline that uses a local-global graph to encode scene geometry, a curvature-aware densification step, a structure-aware encoder, and a refinement network to predict Gaussian attributes, plus a lightweight SAGS-Lite variant with mid-point interpolation for compact representations. Across 13 scenes from Mip-NeRF360, Tanks&Temples, and Deep Blending, SAGS achieves state-of-the-art rendering quality while reducing model storage by up to 11.7× for the full model and up to 24× for the lite variant, all while maintaining real-time rendering. Our results demonstrate that enforcing structure preserves scene topology and depth, reduces artifacts, and enables efficient, high-fidelity neural rendering suitable for VR/AR applications.

Abstract

Following the advent of NeRFs, 3D Gaussian Splatting (3D-GS) has paved the way to real-time neural rendering overcoming the computational burden of volumetric methods. Following the pioneering work of 3D-GS, several methods have attempted to achieve compressible and high-fidelity performance alternatives. However, by employing a geometry-agnostic optimization scheme, these methods neglect the inherent 3D structure of the scene, thereby restricting the expressivity and the quality of the representation, resulting in various floating points and artifacts. In this work, we propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene, which reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets. SAGS is founded on a local-global graph representation that facilitates the learning of complex scenes and enforces meaningful point displacements that preserve the scene's geometry. Additionally, we introduce a lightweight version of SAGS, using a simple yet effective mid-point interpolation scheme, which showcases a compact representation of the scene with up to 24$\times$ size reduction without the reliance on any compression strategies. Extensive experiments across multiple benchmark datasets demonstrate the superiority of SAGS compared to state-of-the-art 3D-GS methods under both rendering quality and model size. Besides, we demonstrate that our structure-aware method can effectively mitigate floating artifacts and irregular distortions of previous methods while obtaining precise depth maps. Project page https://eververas.github.io/SAGS/.

SAGS: Structure-Aware 3D Gaussian Splatting

TL;DR

We address the limitations of geometry-agnostic 3D Gaussian Splatting (3D-GS) in novel-view synthesis, which often yields floaters and depth errors. We propose Structure-Aware Gaussian Splatting (SAGS), a pipeline that uses a local-global graph to encode scene geometry, a curvature-aware densification step, a structure-aware encoder, and a refinement network to predict Gaussian attributes, plus a lightweight SAGS-Lite variant with mid-point interpolation for compact representations. Across 13 scenes from Mip-NeRF360, Tanks&Temples, and Deep Blending, SAGS achieves state-of-the-art rendering quality while reducing model storage by up to 11.7× for the full model and up to 24× for the lite variant, all while maintaining real-time rendering. Our results demonstrate that enforcing structure preserves scene topology and depth, reduces artifacts, and enables efficient, high-fidelity neural rendering suitable for VR/AR applications.

Abstract

Following the advent of NeRFs, 3D Gaussian Splatting (3D-GS) has paved the way to real-time neural rendering overcoming the computational burden of volumetric methods. Following the pioneering work of 3D-GS, several methods have attempted to achieve compressible and high-fidelity performance alternatives. However, by employing a geometry-agnostic optimization scheme, these methods neglect the inherent 3D structure of the scene, thereby restricting the expressivity and the quality of the representation, resulting in various floating points and artifacts. In this work, we propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene, which reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets. SAGS is founded on a local-global graph representation that facilitates the learning of complex scenes and enforces meaningful point displacements that preserve the scene's geometry. Additionally, we introduce a lightweight version of SAGS, using a simple yet effective mid-point interpolation scheme, which showcases a compact representation of the scene with up to 24 size reduction without the reliance on any compression strategies. Extensive experiments across multiple benchmark datasets demonstrate the superiority of SAGS compared to state-of-the-art 3D-GS methods under both rendering quality and model size. Besides, we demonstrate that our structure-aware method can effectively mitigate floating artifacts and irregular distortions of previous methods while obtaining precise depth maps. Project page https://eververas.github.io/SAGS/.
Paper Structure (13 sections, 10 equations, 8 figures, 3 tables)

This paper contains 13 sections, 10 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Structure-Aware GS (SAGS) leverages the intrinsic structure of the scene and enforces point interaction using graph neural networks outperforming the structure agnostic optimization scheme of 3D-GS kerbl3Dgaussians. The 3D-GS method optimizes each Gaussian independently which results in 3D floaters and large point displacements from their original position (left). This can be also validated in the histogram of displacements (right) between the initial and the final position (mean) of a 3D Gaussian. Optimization-based methods neglect the scene structure and displace points far from their initial position to minimize rendering loss, in contrast to SAGS that predicts displacements that preserve the initial structure. The 3D-GS figures are taken directly from the original 3D-GS website.
  • Figure 2: Overview of the proposed method. Given a point cloud obtained from COLMAP sfm, we initially apply a curvature-based densification step to populate under-represented areas. We then apply $k$-NN search to link points $\mathbf{p}$ within local regions and create a point set graph. Leveraging the inductive biases of graph neural networks, we learn a local-global structural feature for each point $\Phi(\mathbf{p}_i, \mathbf{f}_i)$. Using a set of small MLPs we decode the structural features to 3D Gaussian attributes, i.e., color $\boldsymbol{c}$, opacity $\boldsymbol\alpha$, covariance $\boldsymbol{\Sigma}$ and point displacements $\Delta\mathbf{p}$ for the initial point position. Finally, we render the 3D Gaussians following the 3D-GS Gaussian rasterizer kerbl3Dgaussians.
  • Figure 3: Overview of the densification. Given an initial SfM sfm point cloud (left) we estimate the curvature following pauly2002efficient. Curvature values are presented color-coded on the input COLMAP point cloud (middle) where colors with minimum curvature are closer to the purple color. The curvature-aware densification results in more points populating the low-curvature areas (right).
  • Figure 4: Qualitative comparison. We qualitatively evaluate the proposed and the baseline methods (3D-GS kerbl3Dgaussians and Scaffold-GS lu2023scaffold) across six scenes from different datasets. We highlight some detailed differences between the three methods using a magnified crop in yellow. We also emphasize additional visual artifacts using red arrows. The proposed method consistently captures more structural and high-frequency details while minimizing floaters and artifacts compared to the baseline methods.
  • Figure 5: Color Coded Gaussian Displacements. We measured the Gaussians' displacements from their original positions, on the "train" scene from Tanks&Temples Knapitsch2017 dataset, and encoded them in a colormap scale. Colors closer to purple color indicate small displacements. Both the 3D-GS and Scaffold-GS methodologies depend on a rudimentary point optimization approach, that neglects the local topology and fails to guide the Gaussians in a structured manner.
  • ...and 3 more figures