Table of Contents
Fetching ...

Fast Feedforward 3D Gaussian Splatting Compression

Yihang Chen, Qianyi Wu, Mengyao Li, Weiyao Lin, Mehrtash Harandi, Jianfei Cai

TL;DR

The paper tackles the storage burden of 3D Gaussian Splatting (3DGS) representations for novel view synthesis by introducing FCGS, a generalizable, optimization-free compression framework that operates in a single feed-forward pass. It combines a Multi-path Entropy Module (MEM) to selectively compress color attributes with geometry kept intact, and novel inter- and intra-Gaussian context models plus a Gaussian Mixture Model to enable accurate entropy estimation without per-scene finetuning. Empirical results on the DL3DV-GS dataset show over 20× compression with high fidelity, and the method generalizes to 3DGS from feed-forward models in zero-shot settings, while offering fast encoding times and compatible integration with pruning-based compression approaches. Overall, FCGS markedly accelerates 3DGS compression and broadens the practical adoption of explicit 3D representations for real-time rendering and storage-efficient applications.

Abstract

With 3D Gaussian Splatting (3DGS) advancing real-time and high-fidelity rendering for novel view synthesis, storage requirements pose challenges for their widespread adoption. Although various compression techniques have been proposed, previous art suffers from a common limitation: for any existing 3DGS, per-scene optimization is needed to achieve compression, making the compression sluggish and slow. To address this issue, we introduce Fast Compression of 3D Gaussian Splatting (FCGS), an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass, which significantly reduces compression time from minutes to seconds. To enhance compression efficiency, we propose a multi-path entropy module that assigns Gaussian attributes to different entropy constraint paths for balance between size and fidelity. We also carefully design both inter- and intra-Gaussian context models to remove redundancies among the unstructured Gaussian blobs. Overall, FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods. Our code is available at: https://github.com/YihangChen-ee/FCGS.

Fast Feedforward 3D Gaussian Splatting Compression

TL;DR

The paper tackles the storage burden of 3D Gaussian Splatting (3DGS) representations for novel view synthesis by introducing FCGS, a generalizable, optimization-free compression framework that operates in a single feed-forward pass. It combines a Multi-path Entropy Module (MEM) to selectively compress color attributes with geometry kept intact, and novel inter- and intra-Gaussian context models plus a Gaussian Mixture Model to enable accurate entropy estimation without per-scene finetuning. Empirical results on the DL3DV-GS dataset show over 20× compression with high fidelity, and the method generalizes to 3DGS from feed-forward models in zero-shot settings, while offering fast encoding times and compatible integration with pruning-based compression approaches. Overall, FCGS markedly accelerates 3DGS compression and broadens the practical adoption of explicit 3D representations for real-time rendering and storage-efficient applications.

Abstract

With 3D Gaussian Splatting (3DGS) advancing real-time and high-fidelity rendering for novel view synthesis, storage requirements pose challenges for their widespread adoption. Although various compression techniques have been proposed, previous art suffers from a common limitation: for any existing 3DGS, per-scene optimization is needed to achieve compression, making the compression sluggish and slow. To address this issue, we introduce Fast Compression of 3D Gaussian Splatting (FCGS), an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass, which significantly reduces compression time from minutes to seconds. To enhance compression efficiency, we propose a multi-path entropy module that assigns Gaussian attributes to different entropy constraint paths for balance between size and fidelity. We also carefully design both inter- and intra-Gaussian context models to remove redundancies among the unstructured Gaussian blobs. Overall, FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods. Our code is available at: https://github.com/YihangChen-ee/FCGS.

Paper Structure

This paper contains 28 sections, 8 equations, 12 figures, 14 tables.

Figures (12)

  • Figure 1: Left: Existing compression methods require optimization of the existing 3DGS, leading to the drawback of being time-consuming for training. Our proposed FCGS overcomes this issue by compressing 3DGS representations in a single feed-forward pass, significantly reducing time consumption for compression. Right: Compared to Lightgaussian Lightgaussian, FCGS achieves improved RD performance while requiring much less execution time on the DL3DV-GS dataset.
  • Figure 2: Our approach is inspired by image compression, where the input Gaussian attributes $\bm{x}$ is mapped into the latent space $\hat{\bm{y}}$ for compression after passing through an analysis transform $g_a$ and quantization to eliminate redundancies. To compress $\hat{\bm{y}}$, a hyperprior branch is introduced, using the coarse representation $\hat{\bm{z}}$ to estimate the distribution parameters of $\hat{\bm{y}}$ under a Gaussian distribution assumption, which aids in entropy encoding and decoding. In addition to the hyperprior, various context models are applied to $\hat{\bm{y}}$ to improve the estimation of distribution probabilities. After decoding $\hat{\bm{y}}$, a synthesis transform $g_s$ projects it back to the original space as $\hat{\bm{x}}$. A loss function is used to maintain high fidelity between $\hat{\bm{x}}$ and $\bm{x}$ using their rendered images, while minimizing the entropy of $\hat{\bm{y}}$ and $\hat{\bm{z}}$. AE and AD represent Arithmetic Encoding and Arithmetic Decoding, respectively. In our paper, we implement transform networks as simple MLPs. $g_a$ and $g_s$ consist of 4 layers each, while $h_a$ and $h_s$ have 3 layers each.
  • Figure 3: Context models of FCGS. We build on the hyperprior design of image compression (top) and introduce our inter- and intra-Gaussian context models (mid & bottom). Together, they form a GMM that provides a more accurate estimation of the value distribution probability of $\hat{\bm{y}}$ (right).
  • Figure 4: Performance comparison. Each scene is initially trained for $30K$ iterations to produce the vanilla 3DGS. Methods marked with * and circle are finetuned from this common 3DGS (our FCGS also compresses the same 3DGS); Methods marked with ** and triangles are trained from scratch due to their modification to structures. We also present the runtime of our method and other approaches at the bottom of the figure (which is also reflected by the size of the marks), where our approach requires significantly less time for compression. For our runtime, it means using multiple/single GPUs. Thanks to our optimization-free pipeline, we divide the 3DGS into chunks, with each chunk containing $1$ million Gaussians, allowing us to easily encode these chunks in parallel using multiple GPUs, further speeding up the process.
  • Figure 5: Qualitative comparison. We achieve substantial size reduction while preserving high fidelity. PSNR (dB) / SIZE (MB) are indicated in the bottom-right corner. We only present two baseline methods for qualitative comparison here due to space limitation. Please refer to Appendix Section \ref{['sec:additional_qualitative']} for more comprehensive qualitative comparisons.
  • ...and 7 more figures