Table of Contents
Fetching ...

JointRF: End-to-End Joint Optimization for Dynamic Neural Radiance Field Representation and Compression

Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang

TL;DR

Dynamic NeRFs for volumetric video face data and streaming challenges due to high storage and bitrate needs. JointRF introduces an end-to-end framework that jointly optimizes a compact representation (long-reference basis and residuals with coefficient grids) and a sequential feature compression module, using simulated quantization and an entropy-based bitrate model. The method demonstrates superior rate-distortion performance and dramatically reduced model size across multiple datasets, validating the benefits of end-to-end joint optimization for dynamic radiance fields. This work enables more efficient streaming and storage of long-sequence volumetric content, with potential impact on immersive media and telepresence applications.

Abstract

Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos. However, rendering dynamic and long-sequence radiance fields remains challenging due to the significant data required to represent volumetric videos. In this paper, we propose a novel end-to-end joint optimization scheme of dynamic NeRF representation and compression, called JointRF, thus achieving significantly improved quality and compression efficiency against the previous methods. Specifically, JointRF employs a compact residual feature grid and a coefficient feature grid to represent the dynamic NeRF. This representation handles large motions without compromising quality while concurrently diminishing temporal redundancy. We also introduce a sequential feature compression subnetwork to further reduce spatial-temporal redundancy. Finally, the representation and compression subnetworks are end-to-end trained combined within the JointRF. Extensive experiments demonstrate that JointRF can achieve superior compression performance across various datasets.

JointRF: End-to-End Joint Optimization for Dynamic Neural Radiance Field Representation and Compression

TL;DR

Dynamic NeRFs for volumetric video face data and streaming challenges due to high storage and bitrate needs. JointRF introduces an end-to-end framework that jointly optimizes a compact representation (long-reference basis and residuals with coefficient grids) and a sequential feature compression module, using simulated quantization and an entropy-based bitrate model. The method demonstrates superior rate-distortion performance and dramatically reduced model size across multiple datasets, validating the benefits of end-to-end joint optimization for dynamic radiance fields. This work enables more efficient streaming and storage of long-sequence volumetric content, with potential impact on immersive media and telepresence applications.

Abstract

Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos. However, rendering dynamic and long-sequence radiance fields remains challenging due to the significant data required to represent volumetric videos. In this paper, we propose a novel end-to-end joint optimization scheme of dynamic NeRF representation and compression, called JointRF, thus achieving significantly improved quality and compression efficiency against the previous methods. Specifically, JointRF employs a compact residual feature grid and a coefficient feature grid to represent the dynamic NeRF. This representation handles large motions without compromising quality while concurrently diminishing temporal redundancy. We also introduce a sequential feature compression subnetwork to further reduce spatial-temporal redundancy. Finally, the representation and compression subnetworks are end-to-end trained combined within the JointRF. Extensive experiments demonstrate that JointRF can achieve superior compression performance across various datasets.
Paper Structure (12 sections, 6 equations, 5 figures, 2 tables)

This paper contains 12 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The reconstructed results of our proposed JointRF compared to ReRFrerf on various datasets.
  • Figure 2: Overview of JointRF representation. We divide the sequence into several GOFs. Each GOF starts with a keyframe, represented by a long-reference basis feature grid and a coefficient feature grid. Each subsequent frame in the GOF is then represented by a residual feature grid $\mathbf{R}_t$ in conjunction with a coefficient feature grid $\mathbf{C}_t$.
  • Figure 3: Overview of our JointRF training. First, we apply simulated quantization to generate $\mathbf{\hat{C}}_t$ and $\mathbf{\hat{R}}_t$, and then estimate the rate of $\mathbf{\hat{C}}_t$ and $\mathbf{\hat{R}}_t$ as loss during the forward pass. Next, we load the long-reference basis feature grid $\mathbf{\hat{B}}_k$ from the reconstructed keyframe buffer and combine it with $\mathbf{\hat{C}}_t$ and $\mathbf{\hat{R}}_t$ to obtain the MSE loss. Finally, we sequentially train each frame and update the residual feature grid, coefficient feature grid, and the corresponding entropy model.
  • Figure 4: Qualitative comparison against dynamic scene reconstruction methods and per-frame static reconstruction methods.
  • Figure 5: Rate-distortion curves in both the ReRF and DNA-Rendering datasets. Rate-distortion curves not only illustrate the efficiency of various components within our JointRF but also demonstrate its superiority over ReRFrerf.