Table of Contents
Fetching ...

Neural NeRF Compression

Tuan Pham, Stephan Mandt

TL;DR

This work tackles the storage overhead of grid-based NeRF representations by introducing an encoder-free, per-scene nonlinear transform coding approach that compresses the three TensoRF-VM feature planes with a lightweight decoder. It leverages an importance-weighted loss to focus reconstruction on visually significant regions and introduces a masked, spike-and-slab entropy model to sparsify latent codes. Across four diverse datasets, the method achieves superior rate-distortion performance compared with prior grid-based compression baselines, while adding only minor overhead to rendering. The practical impact is substantial: enabling more compact NeRF models suitable for storage-constrained applications without sacrificing rendering quality.

Abstract

Neural Radiance Fields (NeRFs) have emerged as powerful tools for capturing detailed 3D scenes through continuous volumetric representations. Recent NeRFs utilize feature grids to improve rendering quality and speed; however, these representations introduce significant storage overhead. This paper presents a novel method for efficiently compressing a grid-based NeRF model, addressing the storage overhead concern. Our approach is based on the non-linear transform coding paradigm, employing neural compression for compressing the model's feature grids. Due to the lack of training data involving many i.i.d scenes, we design an encoder-free, end-to-end optimized approach for individual scenes, using lightweight decoders. To leverage the spatial inhomogeneity of the latent feature grids, we introduce an importance-weighted rate-distortion objective and a sparse entropy model employing a masking mechanism. Our experimental results validate that our proposed method surpasses existing works in terms of grid-based NeRF compression efficacy and reconstruction quality.

Neural NeRF Compression

TL;DR

This work tackles the storage overhead of grid-based NeRF representations by introducing an encoder-free, per-scene nonlinear transform coding approach that compresses the three TensoRF-VM feature planes with a lightweight decoder. It leverages an importance-weighted loss to focus reconstruction on visually significant regions and introduces a masked, spike-and-slab entropy model to sparsify latent codes. Across four diverse datasets, the method achieves superior rate-distortion performance compared with prior grid-based compression baselines, while adding only minor overhead to rendering. The practical impact is substantial: enabling more compact NeRF models suitable for storage-constrained applications without sacrificing rendering quality.

Abstract

Neural Radiance Fields (NeRFs) have emerged as powerful tools for capturing detailed 3D scenes through continuous volumetric representations. Recent NeRFs utilize feature grids to improve rendering quality and speed; however, these representations introduce significant storage overhead. This paper presents a novel method for efficiently compressing a grid-based NeRF model, addressing the storage overhead concern. Our approach is based on the non-linear transform coding paradigm, employing neural compression for compressing the model's feature grids. Due to the lack of training data involving many i.i.d scenes, we design an encoder-free, end-to-end optimized approach for individual scenes, using lightweight decoders. To leverage the spatial inhomogeneity of the latent feature grids, we introduce an importance-weighted rate-distortion objective and a sparse entropy model employing a masking mechanism. Our experimental results validate that our proposed method surpasses existing works in terms of grid-based NeRF compression efficacy and reconstruction quality.
Paper Structure (47 sections, 9 equations, 13 figures, 7 tables, 1 algorithm)

This paper contains 47 sections, 9 equations, 13 figures, 7 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of our model. At training time (left), we learn the three latent codes $\{\mathbf{Z}_i\}_{i=1}^3$ to reconstruct the three frozen feature planes $\{\mathbf{P}_i\}_{i=1}^3$. The reconstructed feature planes $\{\hat{\mathbf{P}}_i\}_{i=1}^3$. are used to render the scene and calculate the rendering loss. The entropy model $P$ is used to calculate the rate loss and compress the latent codes to bitstring. At rendering time (right), we use $P$ to decompress the bitstring to latent codes $\{\hat{\mathbf{Z}}_i\}_{i=1}^3$ and then reconstruct the feature planes $\{\hat{\mathbf{P}}_i\}_{i=1}^3$.
  • Figure 2: Comparison of rate-distortion curves between our proposed methods and the baseline VQ-TensoRF on the Synthetic-NeRF dataset. The upper figure illustrates PSNR against file size, and the lower figure showcases SSIM in relation to file size.
  • Figure 3: Qualitative results on Chair and Mic scenes from the Synthetic-NeRF dataset. From left to right: uncompressed, VQ-TensorF (average size 3.6 MB), ECTensoRF-L (3.4 MB), ECTensoRF-H (1.6 MB). Our decompressed renderings are barely distinguishable in quality from both uncompressed and VQ-compressed versions at a significantly reduced file size.
  • Figure 4: Rate-distortion comparison between traditional nonlinear transform coding (green), trained across 7 scenes, and our per-scene compression methods (orange, blue).
  • Figure 5: Ablation studies. Top: rate-distortion comparison of our approach against a version with encoder, trained on a single scene. Bottom: comparisons between model versions with factorized prior and without importance weight.
  • ...and 8 more figures