Table of Contents
Fetching ...

Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs

Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge

TL;DR

This work tackles the high storage footprint of 3D Gaussian Splatting (3DGS) by introducing a compact compression framework that uses tri-plane feature planes to predict all gaussian attributes and optimizes them with frequency-domain entropy modeling and channel-aware bit allocation. By transforming planes with a 2D DCT and employing a per-channel entropy objective plus channel importance-guided weighting, the method exploits spatial correlations and integrates seamlessly with standard video codecs to achieve substantial rate-distortion gains. Across Mip-NeRF360, DeepBlending, and Tank & Temples, the approach delivers storage under 10 MB per scene with up to $76×$ compression and only negligible PSNR losses, while maintaining rendering speed comparable to the original 3DGS. This enables practical, codec-friendly 3D scene compression suitable for mobile devices and broad deployment, and points to further gains as video codecs evolve.

Abstract

3D Gaussian Splatting is a recognized method for 3D scene representation, known for its high rendering quality and speed. However, its substantial data requirements present challenges for practical applications. In this paper, we introduce an efficient compression technique that significantly reduces storage overhead by using compact representation. We propose a unified architecture that combines point cloud data and feature planes through a progressive tri-plane structure. Our method utilizes 2D feature planes, enabling continuous spatial representation. To further optimize these representations, we incorporate entropy modeling in the frequency domain, specifically designed for standard video codecs. We also propose channel-wise bit allocation to achieve a better trade-off between bitrate consumption and feature plane representation. Consequently, our model effectively leverages spatial correlations within the feature planes to enhance rate-distortion performance using standard, non-differentiable video codecs. Experimental results demonstrate that our method outperforms existing methods in data compactness while maintaining high rendering quality. Our project page is available at https://fraunhoferhhi.github.io/CodecGS

Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs

TL;DR

This work tackles the high storage footprint of 3D Gaussian Splatting (3DGS) by introducing a compact compression framework that uses tri-plane feature planes to predict all gaussian attributes and optimizes them with frequency-domain entropy modeling and channel-aware bit allocation. By transforming planes with a 2D DCT and employing a per-channel entropy objective plus channel importance-guided weighting, the method exploits spatial correlations and integrates seamlessly with standard video codecs to achieve substantial rate-distortion gains. Across Mip-NeRF360, DeepBlending, and Tank & Temples, the approach delivers storage under 10 MB per scene with up to compression and only negligible PSNR losses, while maintaining rendering speed comparable to the original 3DGS. This enables practical, codec-friendly 3D scene compression suitable for mobile devices and broad deployment, and points to further gains as video codecs evolve.

Abstract

3D Gaussian Splatting is a recognized method for 3D scene representation, known for its high rendering quality and speed. However, its substantial data requirements present challenges for practical applications. In this paper, we introduce an efficient compression technique that significantly reduces storage overhead by using compact representation. We propose a unified architecture that combines point cloud data and feature planes through a progressive tri-plane structure. Our method utilizes 2D feature planes, enabling continuous spatial representation. To further optimize these representations, we incorporate entropy modeling in the frequency domain, specifically designed for standard video codecs. We also propose channel-wise bit allocation to achieve a better trade-off between bitrate consumption and feature plane representation. Consequently, our model effectively leverages spatial correlations within the feature planes to enhance rate-distortion performance using standard, non-differentiable video codecs. Experimental results demonstrate that our method outperforms existing methods in data compactness while maintaining high rendering quality. Our project page is available at https://fraunhoferhhi.github.io/CodecGS
Paper Structure (12 sections, 9 equations, 12 figures, 7 tables)

This paper contains 12 sections, 9 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Our method achieves a 146$\times$ compression with negligible loss in image quality, a significant improvement over 3DGS kerbl20233d the 'bicycle' scene. Our method seamlessly integrates with standard video codecs and utilizes the original 3DGS rendering pipeline, achieving comparable rendering speeds with minimal overhead.
  • Figure 2: Overview of the proposed model. Following the original 3DGS densification, we train the model to predict all gaussian attributes using the feature plane architecture. The feature plane achieves a more compact representation through our proposed DCT entropy modeling and channel bit allocation techniques, which leads to performance improvements.
  • Figure 3: Visualization of the first channel in the XY plane $\mathcal{P}_{1}^{XY}$ for the 'Bonsai' scene. For better visualization, we used a very high term for each loss. (a) with sparsity $\mathcal{L}_{1}$ loss and (b) with proposed DCT entropy $\mathcal{L}_{ent}$ loss. We use the piecewise-projective contraction detailed in Section \ref{['sec:sec4.1']}.
  • Figure 4: Visualization of the 1st, 3rd, and 5th channels of XZ plane with progressive training for the 'Flowers' scene. The iteration stages $T_{i}$ are {0, 5000, 10000} with corresponding $L_{i}$ values of {2, 4, 6} for 30k iteration training. With the progressive training, parameter energy is mainly concentrated in the lower-level channels, while the higher-level channels show sparser representations.
  • Figure 5: RD curves for quantitative comparisons. Rate-distortion (RD) plots are provided for each dataset using 3DGS compression models. For the x-axis, a log$_{2}$ scale is used for better visualization.
  • ...and 7 more figures