NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
Sicheng Li, Hao Li, Yiyi Liao, Lu Yu
TL;DR
NeRFCodec tackles memory-efficient NeRF scene representation by compressing plane-based features through a learned non-linear transform embedded in a content-adaptive encoder and a tuned decoder head, while reusing a pre-trained 2D neural image codec. It jointly optimizes rendering accuracy and bitrate via a rate–distortion objective, storing a latent code $\hat{y}$, a hyperprior $\hat{z}$, the decoder head $\boldsymbol{D_{\phi_2}}$, and feature residuals in the bitstream. The approach achieves a memory footprint around $0.5$ MB per scene and outperforms prior NeRF compression methods in rate–distortion across synthetic and real datasets. Limitations include the time-cost of training the per-scene non-linear transform and the need to tailor a codec per scene, motivating future work on scaling data and learning a generalized neural feature codec.
Abstract
The emergence of Neural Radiance Fields (NeRF) has greatly impacted 3D scene modeling and novel-view synthesis. As a kind of visual media for 3D scene representation, compression with high rate-distortion performance is an eternal target. Motivated by advances in neural compression and neural field representation, we propose NeRFCodec, an end-to-end NeRF compression framework that integrates non-linear transform, quantization, and entropy coding for memory-efficient scene representation. Since training a non-linear transform directly on a large scale of NeRF feature planes is impractical, we discover that pre-trained neural 2D image codec can be utilized for compressing the features when adding content-specific parameters. Specifically, we reuse neural 2D image codec but modify its encoder and decoder heads, while keeping the other parts of the pre-trained decoder frozen. This allows us to train the full pipeline via supervision of rendering loss and entropy loss, yielding the rate-distortion balance by updating the content-specific parameters. At test time, the bitstreams containing latent code, feature decoder head, and other side information are transmitted for communication. Experimental results demonstrate our method outperforms existing NeRF compression methods, enabling high-quality novel view synthesis with a memory budget of 0.5 MB.
