Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Tao Lu; Mulin Yu; Linning Xu; Yuanbo Xiangli; Limin Wang; Dahua Lin; Bo Dai

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai

TL;DR

Scaffold-GS tackles redundancy and fragility in 3D-GS by introducing a structured, anchor-guided representation where a sparse grid of anchor points from SfM grounds the scene and spawns on-the-fly neural Gaussians conditioned on view direction and distance. Neural Gaussian attributes are decoded per view from anchor features, enabling robust, view-adaptive rendering with a compact model size. The method employs anchor growing and pruning to progressively improve scene coverage while keeping inference fast, achieving comparable or superior rendering quality to state-of-the-art methods with significantly reduced storage and real-time performance. This approach enhances robustness to view changes, texture-less regions, and lighting variations, offering practical benefits for large-scale and multi-LOD scenes in real-time neural rendering applications.

Abstract

Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to fit every training view, neglecting the underlying scene geometry. Consequently, the resulting model becomes less robust to significant view changes, texture-less area and lighting effects. We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians, and predicts their attributes on-the-fly based on viewing direction and distance within the view frustum. Anchor growing and pruning strategies are developed based on the importance of neural Gaussians to reliably improve the scene coverage. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering. We also demonstrates an enhanced capability to accommodate scenes with varying levels-of-detail and view-dependent observations, without sacrificing the rendering speed.

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

TL;DR

Abstract

Paper Structure (39 sections, 13 equations, 14 figures, 17 tables)

This paper contains 39 sections, 13 equations, 14 figures, 17 tables.

Introduction
Related work
MLP-based Neural Fields and Rendering.
Grid-based Neural Fields and Rendering.
Point-based Neural Fields and Rendering.
Methods
Preliminaries
Scaffold-GS
Anchor Point Initialization
Neural Gaussian Derivation
Anchor Points Refinement
Growing Operation.
Pruning Operation
Losses Design
Experiments
...and 24 more sections

Figures (14)

Figure 1: Scaffold-GS represents the scene using a set of 3D Gaussians structured in a dual-layered hierarchy. Anchored on a sparse grid of initial points, a modest set of neural Gaussians are spawned from each anchor to dynamically adapt to various viewing angles and distances. Our method achieves rendering quality and speed comparable to 3D-GS with a more compact model (last row metrics: PSNR/storage size/FPS). Across multiple datasets, Scaffold-GS demonstrates more robustness in large outdoor scenes and intricate indoor environments with challenging observing views e.g. transparency, specularity, reflection, texture-less regions and fine-scale details.
Figure 2: Overview of Scaffold-GS. (a) We start by forming a sparse voxel grid from SfM-derived points. An anchor associated with a learnable scale is placed at the center of each voxel, roughly sculpturing the scene occupancy. (b) Within a view frustum, $k$ neural Gaussians are spawned from each visible anchor with offsets $\{\mathcal{O}_k\}$. Their attributes, i.e. opacity, color, scale and quaternion are then decoded from the anchor feature, relative camera-anchor viewing direction and distance using $F_{\alpha}, F_c, F_s, F_q$ respectively. (c) Note that to alleviate redundancy and improve efficiency, only non-trivial neural Gussians (i.e.$\alpha \geq \tau_\alpha$) are rasterized following kerbl3Dgaussians. The rendered image is supervised via reconstruction ($\mathcal{L}_1$), structural similarity ($\mathcal{L}_{SSIM}$) and a volume regularization ($\mathcal{L}_{vol}$).
Figure 3: Growing operation. We develop an anchor growing policy guided by the gradients of the neural Gaussians. From left to right, we spatially quantize neural Gaussians into multi-resolution voxels ($m\in\{1,2,3\}$) of size $\{\epsilon_{g}^{(m)}\}$. New anchors are added to voxels with aggregated gradients larger than $\{\tau_{g}^{(m)}$}.
Figure 4: Qualitative comparison of Scaffold-GS and 3D-GS kerbl3Dgaussians across diverse datasets barron2022mipnerf360Knapitsch2017DeepBlending2018VRNeRF. Patches that highlight the visual differences are emphasized with arrows and green & yellow insets for clearer visibility. Our approach constantly outperforms 3D-GS on these scenes, with evident advantages in challenging scenarios, e.g. thin geometry and fine-scale details (Mip360-Room(a), Mip360-Counter(a)), texture-less regions (DB-DrJohnson, DB-Playroom), light effects (Mip360-Counter(b), DB-DrJohnson), insufficient observations (TandT-Train, VR-Kitchen). It can also be observed (e.g.VR-Apartment) that our model is superior in representing contents at varying scales and viewing distances.
Figure 5: Comparison on multi-scale scenes (w/ zoom-in cases). We showcase the rendering outcomes at an unsceen closer scale on the Amsterdam scene from BungeeNeRF. Our method smoothly extrapolates to new viewing distances using refined neural Gaussian properties, remedying the needle-like artifacts of original 3D-GS caused by fixed Gaussian scaling values.
...and 9 more figures

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

TL;DR

Abstract

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Authors

TL;DR

Abstract

Table of Contents

Figures (14)