Table of Contents
Fetching ...

HPC: Hierarchical Progressive Coding Framework for Volumetric Video

Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang

TL;DR

HPC tackles the data-volume burden of NeRF-based volumetric video by introducing a hierarchical progressive coding framework that represents dynamic scenes as a multi-resolution residual radiance field. A single model supports multiple bitrate/quality levels through GoF-based frame residuals and level-wise encoding, enabling variable bitrate and progressive streaming without retraining. End-to-end training combines simulated quantization with a rate-distortion objective via a learned entropy model, and a progressive strategy explicitly supervises increasing resolution levels to boost RD performance. Experimental results on multiple datasets show HPC delivers scalable quality with competitive RD metrics, outperforming fixed-bitrate baselines and enabling flexible streaming under varying network and device constraints.

Abstract

Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hierarchical progressive volumetric video coding framework achieving variable bitrate using a single model. Specifically, HPC introduces a hierarchical representation with a multi-resolution residual radiance field to reduce temporal redundancy in long-duration sequences while simultaneously generating various levels of detail. Then, we propose an end-to-end progressive learning approach with a multi-rate-distortion loss function to jointly optimize both hierarchical representation and compression. Our HPC trained only once can realize multiple compression levels, while the current methods need to train multiple fixed-bitrate models for different rate-distortion (RD) tradeoffs. Extensive experiments demonstrate that HPC achieves flexible quality levels with variable bitrate by a single model and exhibits competitive RD performance, even outperforming fixed-bitrate models across various datasets.

HPC: Hierarchical Progressive Coding Framework for Volumetric Video

TL;DR

HPC tackles the data-volume burden of NeRF-based volumetric video by introducing a hierarchical progressive coding framework that represents dynamic scenes as a multi-resolution residual radiance field. A single model supports multiple bitrate/quality levels through GoF-based frame residuals and level-wise encoding, enabling variable bitrate and progressive streaming without retraining. End-to-end training combines simulated quantization with a rate-distortion objective via a learned entropy model, and a progressive strategy explicitly supervises increasing resolution levels to boost RD performance. Experimental results on multiple datasets show HPC delivers scalable quality with competitive RD metrics, outperforming fixed-bitrate baselines and enabling flexible streaming under varying network and device constraints.

Abstract

Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hierarchical progressive volumetric video coding framework achieving variable bitrate using a single model. Specifically, HPC introduces a hierarchical representation with a multi-resolution residual radiance field to reduce temporal redundancy in long-duration sequences while simultaneously generating various levels of detail. Then, we propose an end-to-end progressive learning approach with a multi-rate-distortion loss function to jointly optimize both hierarchical representation and compression. Our HPC trained only once can realize multiple compression levels, while the current methods need to train multiple fixed-bitrate models for different rate-distortion (RD) tradeoffs. Extensive experiments demonstrate that HPC achieves flexible quality levels with variable bitrate by a single model and exhibits competitive RD performance, even outperforming fixed-bitrate models across various datasets.
Paper Structure (20 sections, 10 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 10 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of our HPC framework. In progressive encoding, residual grids network takes images $\mathbf{I}_t$ and previous reconstructed feature grids $\hat{\mathbf{F}}_{t-1}$ as input, generates multi-resolution residuals $\mathbf{R}_t$. After quantization $\mathrm{Q}$, the residuals are encoded into a bitstream $B_t$ via entropy encoder $\mathbf{E}$. During progressive decoding, residuals are decoded from the bitstream and then recursively integrates with prior reference grids to recover the current frame features layer by layer.
  • Figure 2: The multi-layered feature grids for subsequent frames $\mathbf{F}_t$ can be recursively reconstructed by layer-wise accumulation of residuals $\mathbf{R}_t$.
  • Figure 3: Overview of our hierarchical progressive training. We generate different resolution feature grids ${\mathbf{R}_t^l}$ from current frame images and previous reference feature $\hat{\mathbf{F}}_{t-1}$ from buffer. The network trains on the most basic resolution grids, $l = 1$. As training advances, it progressively incorporates higher resolution grids from the next level, while supervising at each layer via the multi-rate-distortion loss $L^l$. After training is completed, the trained feature grids ${\hat{\mathbf{F}}_{t}^l}$ are stored in the reference frame buffer.
  • Figure 4: Qualitative comparison against volumetric video coding methods TineuVoxtineuvox, K-Planeskplanes, ReRFrerf, TeTriRFtetrirf.
  • Figure 5: Rate-distortion curves in both the ReRF and DNA-Rendering datasets. Rate-distortion curves not only illustrate the efficiency of various components within our method, but also demonstrate its superiority over ReRFrerf and TeTriRFtetrirf.
  • ...and 3 more figures