Table of Contents
Fetching ...

Rate-aware Compression for NeRF-based Volumetric Video

Zhiyu Zhang, Guo Lu, Huanxiong Liang, Zhengxue Cheng, Anni Tang, Li Song

TL;DR

This paper tries to directly learn a compact NeRF representation for volumetric video in the training stage based on the proposed rate-aware compression framework, and proposes an adaptive quantization strategy and learns the optimal quantization step for the NeRF representations.

Abstract

The neural radiance fields (NeRF) have advanced the development of 3D volumetric video technology, but the large data volumes they involve pose significant challenges for storage and transmission. To address these problems, the existing solutions typically compress these NeRF representations after the training stage, leading to a separation between representation training and compression. In this paper, we try to directly learn a compact NeRF representation for volumetric video in the training stage based on the proposed rate-aware compression framework. Specifically, for volumetric video, we use a simple yet effective modeling strategy to reduce temporal redundancy for the NeRF representation. Then, during the training phase, an implicit entropy model is utilized to estimate the bitrate of the NeRF representation. This entropy model is then encoded into the bitstream to assist in the decoding of the NeRF representation. This approach enables precise bitrate estimation, thereby leading to a compact NeRF representation. Furthermore, we propose an adaptive quantization strategy and learn the optimal quantization step for the NeRF representations. Finally, the NeRF representation can be optimized by using the rate-distortion trade-off. Our proposed compression framework can be used for different representations and experimental results demonstrate that our approach significantly reduces the storage size with marginal distortion and achieves state-of-the-art rate-distortion performance for volumetric video on the HumanRF and ReRF datasets. Compared to the previous state-of-the-art method TeTriRF, we achieved an approximately -80% BD-rate on the HumanRF dataset and -60% BD-rate on the ReRF dataset.

Rate-aware Compression for NeRF-based Volumetric Video

TL;DR

This paper tries to directly learn a compact NeRF representation for volumetric video in the training stage based on the proposed rate-aware compression framework, and proposes an adaptive quantization strategy and learns the optimal quantization step for the NeRF representations.

Abstract

The neural radiance fields (NeRF) have advanced the development of 3D volumetric video technology, but the large data volumes they involve pose significant challenges for storage and transmission. To address these problems, the existing solutions typically compress these NeRF representations after the training stage, leading to a separation between representation training and compression. In this paper, we try to directly learn a compact NeRF representation for volumetric video in the training stage based on the proposed rate-aware compression framework. Specifically, for volumetric video, we use a simple yet effective modeling strategy to reduce temporal redundancy for the NeRF representation. Then, during the training phase, an implicit entropy model is utilized to estimate the bitrate of the NeRF representation. This entropy model is then encoded into the bitstream to assist in the decoding of the NeRF representation. This approach enables precise bitrate estimation, thereby leading to a compact NeRF representation. Furthermore, we propose an adaptive quantization strategy and learn the optimal quantization step for the NeRF representations. Finally, the NeRF representation can be optimized by using the rate-distortion trade-off. Our proposed compression framework can be used for different representations and experimental results demonstrate that our approach significantly reduces the storage size with marginal distortion and achieves state-of-the-art rate-distortion performance for volumetric video on the HumanRF and ReRF datasets. Compared to the previous state-of-the-art method TeTriRF, we achieved an approximately -80% BD-rate on the HumanRF dataset and -60% BD-rate on the ReRF dataset.

Paper Structure

This paper contains 17 sections, 7 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Comparison of compression performance with the state-of-the-art method, TeTriRF wu2023tetrirf. Compared to TeTriRF, our method achieves approximately 1 dB higher PSNR at a similar bitrate, and the BD-rate is -83%.
  • Figure 2: Demonstration of DiF representation.
  • Figure 3: The training pipeline of our proposed method. (a) Firstly, the reconstructed representation $\hat{G}_{t-1}$ from the previous frame is retrieved from the decode buffer, and based on this, the residual representation $R_t$ is trained. (b) During training, adaptive quantization is utilized to enable representations at different scales to learn the optimal quantization step. Additionally, the spatial-temporal context implicit entropy model is used to estimate the bitrate of the explicit representation $R_t$. Ultimately, rate-distortion optimization is performed by integrating distortion loss and rate loss.
  • Figure 4: Quantized explicit representation amplitude distribution histograms.
  • Figure 5: Illustration of spatial-temporal implicit entropy model. Utilizing the decoded spatial and temporal contexts to predict the distribution of the voxel to be encoded. Although we are actually searching for spatial context in a 3D space, for clarity, we use a 2D plane as an example in the illustration.
  • ...and 4 more figures