Table of Contents
Fetching ...

Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction

Jingui Ma, Yang Hu, Luyang Tang, Jiayu Yang, Yongqi Zhai, Ronggang Wang

TL;DR

3D Gaussian Splatting (3DGS) enables real-time neural rendering but incurs heavy storage costs. The authors introduce a spatial condition-based prediction framework that predicts the anchor feature $f$ from grid-derived context $f_c$ and a learnable residual $f_r$ via FP-Net, supplemented by an instance-aware hyper prior to improve residual entropy estimation. This approach achieves substantial bit-rate reductions, notably 24.42% over the SOTA HAC method and up to 105x reduction compared with vanilla 3DGS, while maintaining rendering quality across five datasets. The work advances practical deployment of 3DGS by combining prediction and structured entropy modeling, and provides a release-ready codebase.

Abstract

Recently, 3D Gaussian Spatting (3DGS) has gained widespread attention in Novel View Synthesis (NVS) due to the remarkable real-time rendering performance. However, the substantial cost of storage and transmission of vanilla 3DGS hinders its further application (hundreds of megabytes or even gigabytes for a single scene). Motivated by the achievements of prediction in video compression, we introduce the prediction technique into the anchor-based Gaussian representation to effectively reduce the bit rate. Specifically, we propose a spatial condition-based prediction module to utilize the grid-captured scene information for prediction, with a residual compensation strategy designed to learn the missing fine-grained information. Besides, to further compress the residual, we propose an instance-aware hyper prior, developing a structure-aware and instance-aware entropy model. Extensive experiments demonstrate the effectiveness of our prediction-based compression framework and each technical component. Even compared with SOTA compression method, our framework still achieves a bit rate savings of 24.42 percent. Code is to be released!

Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction

TL;DR

3D Gaussian Splatting (3DGS) enables real-time neural rendering but incurs heavy storage costs. The authors introduce a spatial condition-based prediction framework that predicts the anchor feature from grid-derived context and a learnable residual via FP-Net, supplemented by an instance-aware hyper prior to improve residual entropy estimation. This approach achieves substantial bit-rate reductions, notably 24.42% over the SOTA HAC method and up to 105x reduction compared with vanilla 3DGS, while maintaining rendering quality across five datasets. The work advances practical deployment of 3DGS by combining prediction and structured entropy modeling, and provides a release-ready codebase.

Abstract

Recently, 3D Gaussian Spatting (3DGS) has gained widespread attention in Novel View Synthesis (NVS) due to the remarkable real-time rendering performance. However, the substantial cost of storage and transmission of vanilla 3DGS hinders its further application (hundreds of megabytes or even gigabytes for a single scene). Motivated by the achievements of prediction in video compression, we introduce the prediction technique into the anchor-based Gaussian representation to effectively reduce the bit rate. Specifically, we propose a spatial condition-based prediction module to utilize the grid-captured scene information for prediction, with a residual compensation strategy designed to learn the missing fine-grained information. Besides, to further compress the residual, we propose an instance-aware hyper prior, developing a structure-aware and instance-aware entropy model. Extensive experiments demonstrate the effectiveness of our prediction-based compression framework and each technical component. Even compared with SOTA compression method, our framework still achieves a bit rate savings of 24.42 percent. Code is to be released!

Paper Structure

This paper contains 14 sections, 8 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Pipeline of our method. For prediction (in red background), anchor position $\boldsymbol{x}$ is used to query the Hash grid to obtain the spatial condition $\boldsymbol{f_{c}}$. Then, $\boldsymbol{f_{c}}$ and the residual $\boldsymbol{f_{r}}$ are concatenated (denoted as $\boldsymbol{\oplus}$) and input to a Feature Prediction Network (FP-Net) to obtain predicted feature $\boldsymbol{f_{p}}$, which is used along with scale $\boldsymbol{l}$ and offsets $\boldsymbol{o}$ to generate Gaussians for rendering. For probability estimation (in blue background), the residual $\boldsymbol{f_{r}}$ is embeded into an instance-aware context $\boldsymbol{z}$ (i.e., hyper prior) by Instance-aware Context Encoder (IC-Encoder). $\boldsymbol{z}$ and the grid-captured $\boldsymbol{f_{c}}$ (which can also be thought as structure-aware context) are concatenated and put into Probability Estimation Network (PE-Net), which outputs the probability distribution of anchor attributes. (Note that in our framework the feature $\boldsymbol{f}$ is removed from anchor's attributes and replaced with residual $\boldsymbol{f_{r}}$.)
  • Figure 2: Hyper prior encoding module. In training, the instance-aware context $\boldsymbol{z}$ is extracted through the residual. After training, $\boldsymbol{z}$ is encoded by an Arithmetic Encoder (AE) into bitstream for storage and transmission. The Arithmetic Decoder (AD) can recover $\boldsymbol{z}$ as a hyper prior for the residual during decoding. The probability distribution parameters and quantization step required by AE and AD are derived from a learnable Cumulative Distribution Function (CDF) as in balle2018hyperprior and an adaptive quantization table.
  • Figure 3: Pre-experiment results. (a)-(d) respectively represents the ground truth, image rendered by feature $\boldsymbol{f}$, image rendered by predicted feature only using grid, and the difference between (b) and (c), denoted as residual information.