Table of Contents
Fetching ...

A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets

Bernhard Kerbl, Andréas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, George Drettakis

TL;DR

This work tackles the scalability barrier in novel-view synthesis by introducing a hierarchical 3D Gaussian representation with a Level-of-Detail (LOD) mechanism. A divide-and-conquer pipeline trains and merges per-chunk Gaussians into a global, renderable hierarchy, enabling real-time navigation of city-scale scenes using tens of thousands of images on consumer-grade hardware. Key contributions include (i) a BVH-based hierarchy over 3D Gaussian primitives with depth- and view-aware optimization, (ii) an efficient cut and interpolation strategy for smooth LOD transitions, and (iii) a chunk-based training and consolidation framework that supports parallel processing and scalable rendering. The approach yields real-time rendering for large environments, with strong qualitative/quantitative results, and broad practical impact for scalable radiance-field representations and city-scale reconstruction.

Abstract

Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels.We introduce a divide-and-conquer approach that allows us to train very large scenes in independent chunks. We consolidate the chunks into a hierarchy that can be optimized to further improve visual quality of Gaussians merged into intermediate nodes. Very large captures typically have sparse coverage of the scene, presenting many challenges to the original 3D Gaussian splatting training method; we adapt and regularize training to account for these issues. We present a complete solution, that enables real-time rendering of very large scenes and can adapt to available resources thanks to our LOD method. We show results for captured scenes with up to tens of thousands of images with a simple and affordable rig, covering trajectories of up to several kilometers and lasting up to one hour. Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/

A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets

TL;DR

This work tackles the scalability barrier in novel-view synthesis by introducing a hierarchical 3D Gaussian representation with a Level-of-Detail (LOD) mechanism. A divide-and-conquer pipeline trains and merges per-chunk Gaussians into a global, renderable hierarchy, enabling real-time navigation of city-scale scenes using tens of thousands of images on consumer-grade hardware. Key contributions include (i) a BVH-based hierarchy over 3D Gaussian primitives with depth- and view-aware optimization, (ii) an efficient cut and interpolation strategy for smooth LOD transitions, and (iii) a chunk-based training and consolidation framework that supports parallel processing and scalable rendering. The approach yields real-time rendering for large environments, with strong qualitative/quantitative results, and broad practical impact for scalable radiance-field representations and city-scale reconstruction.

Abstract

Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels.We introduce a divide-and-conquer approach that allows us to train very large scenes in independent chunks. We consolidate the chunks into a hierarchy that can be optimized to further improve visual quality of Gaussians merged into intermediate nodes. Very large captures typically have sparse coverage of the scene, presenting many challenges to the original 3D Gaussian splatting training method; we adapt and regularize training to account for these issues. We present a complete solution, that enables real-time rendering of very large scenes and can adapt to available resources thanks to our LOD method. We show results for captured scenes with up to tens of thousands of images with a simple and affordable rig, covering trajectories of up to several kilometers and lasting up to one hour. Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/
Paper Structure (40 sections, 12 equations, 10 figures, 6 tables)

This paper contains 40 sections, 12 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: (a) The blue and red 3D Gaussians are leaf primitives that are projected to 2D (b). We visualize a scanline (black line in (c)) in 2D and plot the corresponding $\alpha$-blending weights $\alpha_r$ and $\alpha_b$ for red and blue respectively. The cumulative effect of blending according to their opacity (Eq. \ref{['eq:blend']}) is shown in (e); we see that the effect is a non-Gaussian cumulative fall-off. We want to create an intermediate node to represent the two leaves, shown in purple (a). Taking a scanline through the purple projected intermediate node (g), we show that the falloff value we introduce to replace opacity achieves a similar slower fall-off effect (f); however the value can be larger than 1, which we clamp appropriately during $\alpha$-blending (see text).
  • Figure 2: (a) The granularity $\epsilon(n)$ of the green node $n$ is defined as the projected screen size of the node. (b) Nodes satisfying target granularity $\tau_{\epsilon}$ of e.g., 1 pixel or less are included in the cut for a given view $V$. In practice, $\tau_{\epsilon}$ is a threshold for projected AABB axis lengths $\epsilon(n)$ of each node $n$.
  • Figure 3: (a) The ambiguity of the rotation axes of Gaussians can result in undesired rotations when interpolating between nodes. (b) When switching nodes, the two children are rendered with the same parameters as the purple parent, and progressively interpolated towards their separate values.
  • Figure 4: (a) Ground truth image and (b) fine-detail setting of our method. In contrast to conventional multi-scale training, we supervise all levels on the full resolution instead of rescaled images (c), which would encourage blur. Our approach preserves sharp high-frequency features at much coarser settings (d).
  • Figure 5: (a) The purple node contains the red and blue nodes in the hierarchy. The most common understanding of LOD is based on distance $\Delta$: When the viewpoint is close, we descend in the hierarchy (b) while when further away we use the higher-level node (c). Corresponding cuts are illustrated in (d).
  • ...and 5 more figures