Table of Contents
Fetching ...

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

Sicheng Li, Hao Li, Yiyi Liao, Lu Yu

TL;DR

NeRFCodec tackles memory-efficient NeRF scene representation by compressing plane-based features through a learned non-linear transform embedded in a content-adaptive encoder and a tuned decoder head, while reusing a pre-trained 2D neural image codec. It jointly optimizes rendering accuracy and bitrate via a rate–distortion objective, storing a latent code $\hat{y}$, a hyperprior $\hat{z}$, the decoder head $\boldsymbol{D_{\phi_2}}$, and feature residuals in the bitstream. The approach achieves a memory footprint around $0.5$ MB per scene and outperforms prior NeRF compression methods in rate–distortion across synthetic and real datasets. Limitations include the time-cost of training the per-scene non-linear transform and the need to tailor a codec per scene, motivating future work on scaling data and learning a generalized neural feature codec.

Abstract

The emergence of Neural Radiance Fields (NeRF) has greatly impacted 3D scene modeling and novel-view synthesis. As a kind of visual media for 3D scene representation, compression with high rate-distortion performance is an eternal target. Motivated by advances in neural compression and neural field representation, we propose NeRFCodec, an end-to-end NeRF compression framework that integrates non-linear transform, quantization, and entropy coding for memory-efficient scene representation. Since training a non-linear transform directly on a large scale of NeRF feature planes is impractical, we discover that pre-trained neural 2D image codec can be utilized for compressing the features when adding content-specific parameters. Specifically, we reuse neural 2D image codec but modify its encoder and decoder heads, while keeping the other parts of the pre-trained decoder frozen. This allows us to train the full pipeline via supervision of rendering loss and entropy loss, yielding the rate-distortion balance by updating the content-specific parameters. At test time, the bitstreams containing latent code, feature decoder head, and other side information are transmitted for communication. Experimental results demonstrate our method outperforms existing NeRF compression methods, enabling high-quality novel view synthesis with a memory budget of 0.5 MB.

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

TL;DR

NeRFCodec tackles memory-efficient NeRF scene representation by compressing plane-based features through a learned non-linear transform embedded in a content-adaptive encoder and a tuned decoder head, while reusing a pre-trained 2D neural image codec. It jointly optimizes rendering accuracy and bitrate via a rate–distortion objective, storing a latent code , a hyperprior , the decoder head , and feature residuals in the bitstream. The approach achieves a memory footprint around MB per scene and outperforms prior NeRF compression methods in rate–distortion across synthetic and real datasets. Limitations include the time-cost of training the per-scene non-linear transform and the need to tailor a codec per scene, motivating future work on scaling data and learning a generalized neural feature codec.

Abstract

The emergence of Neural Radiance Fields (NeRF) has greatly impacted 3D scene modeling and novel-view synthesis. As a kind of visual media for 3D scene representation, compression with high rate-distortion performance is an eternal target. Motivated by advances in neural compression and neural field representation, we propose NeRFCodec, an end-to-end NeRF compression framework that integrates non-linear transform, quantization, and entropy coding for memory-efficient scene representation. Since training a non-linear transform directly on a large scale of NeRF feature planes is impractical, we discover that pre-trained neural 2D image codec can be utilized for compressing the features when adding content-specific parameters. Specifically, we reuse neural 2D image codec but modify its encoder and decoder heads, while keeping the other parts of the pre-trained decoder frozen. This allows us to train the full pipeline via supervision of rendering loss and entropy loss, yielding the rate-distortion balance by updating the content-specific parameters. At test time, the bitstreams containing latent code, feature decoder head, and other side information are transmitted for communication. Experimental results demonstrate our method outperforms existing NeRF compression methods, enabling high-quality novel view synthesis with a memory budget of 0.5 MB.
Paper Structure (14 sections, 11 equations, 5 figures, 5 tables)

This paper contains 14 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Compression performance.
  • Figure 2: NeRFCodec. We combine the pre-trained neural 2D image codec with content-specific parameters to compress hybrid NeRF. The feature planes $\boldsymbol{x}$ are fed into feature encoder $\boldsymbol{E_{\theta}}$ to obtain latent code $\boldsymbol{y}$. Latent code $\boldsymbol{y}$ is quantized into $\boldsymbol{\hat{y}}$ in one branch. In another branch, latent code $\boldsymbol{y}$ is sent to the probability estimation module to get its corresponding probability $P_{\boldsymbol{\hat{y}}}$ of quantized latent code $\boldsymbol{\hat{y}}$ for entropy coding. Inside the probability estimation, it leverages a hyperprior encoder $\boldsymbol{H_{E}}$ to obtain hyperprior latent code $\boldsymbol{{z}}$ and a hyperprior decoder $\boldsymbol{H_{D}}$ to estimate probability distribution $P_{\boldsymbol{\hat{y}}}$. The quantized latent code $\boldsymbol{\hat{y}}$ is fed into the feature decoder $\boldsymbol{D_{\phi}}$ to generate reconstructed feature planes $\boldsymbol{\hat{x}}$. The feature decoder consists of feature decoder backbone $\boldsymbol{D_{\phi_{1}}}$ and feature decoder head $\boldsymbol{D_{\phi_{2}}}$. We introduce a feature compensation module to compensate for the loss of high-frequency residuals. We add feature residual matrix $\boldsymbol{M_{\Delta \boldsymbol{x}}}$ represented by the outer product of feature residual vectors $\boldsymbol{v_{\Delta \boldsymbol{x}}}$ to get the final feature planes $\boldsymbol{\tilde{x}}$. The final feature planes $\boldsymbol{\tilde{x}}$ cooperate with a tiny MLP $f$ to predict the color and density of sample points for volume rendering. The red components are updated in training, while the blue components inherit parameters from pre-trained neural image compression and stay frozen. The final bitstreams include quantized latent code $\boldsymbol{\hat{y}}$, quantized hyperprior latent code $\boldsymbol{\hat{z}}$, feature decoder head $\boldsymbol{D_{\phi_{2}}}$, feature residual vectors $\boldsymbol{v_{\Delta \boldsymbol{x}}}$, tiny MLPs $f$, and metadata.
  • Figure 3: Spectrum analysis of decoded feature plane.
  • Figure 4: Memory-quality plot. We plot the memory footprint and visual quality score (PSNR) to compare our method with baseline methods. We use different markers for our method and each category of baseline: circles ($\bullet$) for parameter-efficient data structure methods, crosses ($\times$) for parameter compression methods, and stars ($\star$) for ours.
  • Figure 5: Qualitative comparison on NeRF-Synthetic.