Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai
TL;DR
Scaffold-GS tackles redundancy and fragility in 3D-GS by introducing a structured, anchor-guided representation where a sparse grid of anchor points from SfM grounds the scene and spawns on-the-fly neural Gaussians conditioned on view direction and distance. Neural Gaussian attributes are decoded per view from anchor features, enabling robust, view-adaptive rendering with a compact model size. The method employs anchor growing and pruning to progressively improve scene coverage while keeping inference fast, achieving comparable or superior rendering quality to state-of-the-art methods with significantly reduced storage and real-time performance. This approach enhances robustness to view changes, texture-less regions, and lighting variations, offering practical benefits for large-scale and multi-LOD scenes in real-time neural rendering applications.
Abstract
Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to fit every training view, neglecting the underlying scene geometry. Consequently, the resulting model becomes less robust to significant view changes, texture-less area and lighting effects. We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians, and predicts their attributes on-the-fly based on viewing direction and distance within the view frustum. Anchor growing and pruning strategies are developed based on the importance of neural Gaussians to reliably improve the scene coverage. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering. We also demonstrates an enhanced capability to accommodate scenes with varying levels-of-detail and view-dependent observations, without sacrificing the rendering speed.
