How Far Can We Compress Instant-NGP-Based NeRF?

Yihang Chen; Qianyi Wu; Mehrtash Harandi; Jianfei Cai

How Far Can We Compress Instant-NGP-Based NeRF?

Yihang Chen, Qianyi Wu, Mehrtash Harandi, Jianfei Cai

TL;DR

The paper tackles the storage bottleneck of INGP-based NeRFs by introducing Context-based NeRF Compression (CNC), which learns entropy-aware context models to compress hash embeddings without sacrificing rendering speed or fidelity. It introduces level-wise and dimension-wise context modeling, leveraging hash collisions and occupancy grids as priors, and optimizes a combined distortion-entropy objective. CNC achieves large RD gains on Synthetic-NeRF and Tanks and Temples (up to $\sim86\%$ BD-rate reduction vs BiRF and over $100\times$ size reduction vs Instant-NGP), indicating strong practical potential for scalable NeRF deployment. The work also analyzes design choices and shows that entropy-driven compression can regularize NeRF models while maintaining visual quality, with reasonable training-time trade-offs and clear pathways for acceleration.

Abstract

In recent years, Neural Radiance Field (NeRF) has demonstrated remarkable capabilities in representing 3D scenes. To expedite the rendering process, learnable explicit representations have been introduced for combination with implicit NeRF representation, which however results in a large storage space requirement. In this paper, we introduce the Context-based NeRF Compression (CNC) framework, which leverages highly efficient context models to provide a storage-friendly NeRF representation. Specifically, we excavate both level-wise and dimension-wise context dependencies to enable probability prediction for information entropy reduction. Additionally, we exploit hash collision and occupancy grids as strong prior knowledge for better context modeling. To the best of our knowledge, we are the first to construct and exploit context models for NeRF compression. We achieve a size reduction of 100$\times$ and 70$\times$ with improved fidelity against the baseline Instant-NGP on Synthesic-NeRF and Tanks and Temples datasets, respectively. Additionally, we attain 86.7\% and 82.3\% storage size reduction against the SOTA NeRF compression method BiRF. Our code is available here: https://github.com/YihangChen-ee/CNC.

How Far Can We Compress Instant-NGP-Based NeRF?

TL;DR

BD-rate reduction vs BiRF and over

size reduction vs Instant-NGP), indicating strong practical potential for scalable NeRF deployment. The work also analyzes design choices and shows that entropy-driven compression can regularize NeRF models while maintaining visual quality, with reasonable training-time trade-offs and clear pathways for acceleration.

Abstract

and 70

with improved fidelity against the baseline Instant-NGP on Synthesic-NeRF and Tanks and Temples datasets, respectively. Additionally, we attain 86.7\% and 82.3\% storage size reduction against the SOTA NeRF compression method BiRF. Our code is available here: https://github.com/YihangChen-ee/CNC.

Paper Structure (17 sections, 9 equations, 9 figures, 7 tables)

This paper contains 17 sections, 9 equations, 9 figures, 7 tables.

Introduction
Related work
Method
Preliminaries
Compress Embeddings with Context Model
Level-Wise Context Models
Hash Fusion with Occupancy Grid
Dimension-Wise Context Models
Experiments
Implementation Details
Performance Evaluation
Ablation Study
Fidelity Upper-Bound Influences Performance
Conclusion
More Implementation Details
...and 2 more sections

Figures (9)

Figure 1: Motivation of our work. Instant-NGP represents 3D scenes using 3D hash feature embeddings along with a rendering MLP, which takes a non-negligible storage size with the embeddings accounting for over 99% of storage size (upper-left). To tackle this, we introduce context models to substantially compress feature embeddings, with the three key technical components (bottom-left). Our approach achieves a size reduction of over 100$\times$ while simultaneously improving fidelity.
Figure 2: Overview of the proposed level-wise and dimension-wise context models. In the level-wise context model (dashed blue box), we first find the vertex $n_i$ of the feature vector $\theta_i$ using hash function and then estimate its distribution probability $p_i$ by a Context Fuser$\bm{C_p}$ with aggregated contexts from previously decoded levels. It's worth noting that while the illustration here is 2D, the same approach applies to 3D using trilinear interpolation. In the dimension-wise context models (dashed orange box), the last level of 3D voxel is projected onto 2D planes to obtain Projected Voxel Feature (PVF), which is then used for context interpolation. Deep-blue areas on the voxels indicate empty cells of the occupancy grid. At bottom-right (dashed black box), the formula of the entropy-based Bit Estimator$\bm{E_p}$ is provided, which is carefully designed to ensure a more efficient backward gradient.
Figure 3: Illustration of Hash Fusion. In this toy example, the resolutions of the voxel and occupancy grids are 12 and 7, respectively. The weight of each hash collided vertex $k$ of $\theta_i$ is normalized from its AOE, $AOE_i^k$, which measures the intersection area between the vertex grid (semitransparent dashed red square around the vertex) and occupied cells (light-colored).
Figure 4: Performance overviews and detailed local zoom-in results of our proposed CNC and other methods. We apply $\rm{log10}$ x-axis on the overviews for better visualization while linear x-axis on the zoom-in charts for better comparison. The more a curve goes upper-left, the better the rate-distortion (RD) performance is. Note that we achieve variable bitrates in our approach by changing $\lambda$ from $0.7e-3$ to $8e-3$, while BiRF BiRF achieves that by changing feature dimensions $F$ from 1 to 8. The dashed line "ours-upperbound" represents the upper fidelity bound of our binary NeRF model (i.e., $\lambda=0$).
Figure 5: Orange points represent ablation studies on $L_d$ from 6 to 12, with $\lambda=2e-3$. Green points represent ablation studies on $L_c$ from 1 to 5, with $\lambda=1e-3$. Best results are obtained at $L_d=12$ and $L_c=3$. Experiments are on Synthetic-NeRF dataset.
...and 4 more figures

How Far Can We Compress Instant-NGP-Based NeRF?

TL;DR

Abstract

How Far Can We Compress Instant-NGP-Based NeRF?

Authors

TL;DR

Abstract

Table of Contents

Figures (9)