Table of Contents
Fetching ...

Reducing the Memory Footprint of 3D Gaussian Splatting

Panagiotis Papantonakis, Georgios Kopanas, Bernhard Kerbl, Alexandre Lanvin, George Drettakis

TL;DR

This work proposes an efficient, resolution-aware primitive pruning approach, reducing the primitive count by half, an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and a codebook-based quantization method together with a half-float representation for further memory reduction.

Abstract

3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and real-time rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the number of coefficients for the spherical harmonics used to represent directional radiance, and the precision required to store Gaussian primitive attributes. We present a solution to each of these issues. First, we propose an efficient, resolution-aware primitive pruning approach, reducing the primitive count by half. Second, we introduce an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and finally a codebook-based quantization method, together with a half-float representation for further memory reduction. Taken together, these three components result in a 27 reduction in overall size on disk on the standard datasets we tested, along with a 1.7 speedup in rendering speed. We demonstrate our method on standard datasets and show how our solution results in significantly reduced download times when using the method on a mobile device.

Reducing the Memory Footprint of 3D Gaussian Splatting

TL;DR

This work proposes an efficient, resolution-aware primitive pruning approach, reducing the primitive count by half, an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and a codebook-based quantization method together with a half-float representation for further memory reduction.

Abstract

3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and real-time rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the number of coefficients for the spherical harmonics used to represent directional radiance, and the precision required to store Gaussian primitive attributes. We present a solution to each of these issues. First, we propose an efficient, resolution-aware primitive pruning approach, reducing the primitive count by half. Second, we introduce an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and finally a codebook-based quantization method, together with a half-float representation for further memory reduction. Taken together, these three components result in a 27 reduction in overall size on disk on the standard datasets we tested, along with a 1.7 speedup in rendering speed. We demonstrate our method on standard datasets and show how our solution results in significantly reduced download times when using the method on a mobile device.
Paper Structure (13 sections, 2 equations, 5 figures, 9 tables)

This paper contains 13 sections, 2 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: 3DGS produces photo-realistic renderings from input views and sparse points. Rendering the Gaussians as ellipsoids instead of splats reveals that each scene is modeled by millions of primitives. Each primitive stores a significant amount of information: position $p$, rotation quaternion $q$, scale $s$, opacity $\alpha$, color, and 3 bands of spherical harmonics. This leads to the exceedingly high memory consumption of 3DGS scenes.
  • Figure 2: Left: During training, every 1000 iterations our method evaluates a redundancy score in space, which is then projected to the primitives. Redundant primitives are then culled. Middle: At 15K iterations when densification stops, our method analyzes the SH coefficients to determine which primitives can be represented with 0 (just RGB) 1, 2, or 3 SH bands, which allows us to omit storing unnecessary SH coefficients. Right: Finally, at the end of the training, we perform a codebook quantization of the remaining values, except for primitive positions. The relative reduction of each stage is shown in the figure, for a total of 27 times reduction in memory with 0.21 db PSNR drop on average over all our datasets.
  • Figure 3: Our resolution and scale-aware redundancy metric measures how necessary a Gaussian is to represent the scene. a) Each camera can capture details of specific resolution, the further we move away from the camera, the smaller the spatial resolution this camera can represent. b) Given multiple cameras for a given primitive in the scene, multiple resolutions can be represented. c) For each Gaussian $g_i$, we consider the highest resolution $a_{min}^i$ given the input cameras. We count the number of Gaussians that intersect this region. Then we prune the Gaussians that intersect with regions that are influenced by more than K other Gaussians. In this example, Gaussian $g_0$ will not be pruned because there is at least one region, $a_{min}^0$, influenced by no other Gaussian. While Gaussian $g_2$ intersects with regions $a_{min}^{1,2,3}$. All these regions have many Gaussians influencing them, hence $g_2$ is a good candidate for pruning.
  • Figure 4: From Left to Right: Primitive Reduction only (A), Primitive Reduction and Adaptive SH (B), full method, error image between ours and baseline, and the baseline (original 3DGS). We show the scenes: Bicycle from MipNeRF360, Playroom from Deep Blending, and Truck from Tanks&Temples.
  • Figure 5: Visual comparison between INGP, MeRF, 3DGS and ours.