Table of Contents
Fetching ...

VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression

Qiang Hu, Houqiang Zhong, Zihan Zheng, Xiaoyun Zhang, Zhengxue Cheng, Li Song, Guangtao Zhai, Yanfeng Wang

TL;DR

VRVVC tackles the high data burden of NeRF-based volumetric video by proposing a single end-to-end model capable of variable-rate compression. It combines a compact tri-plane residual dynamic modeling of inter-frame content with a variable-rate entropy coding scheme that uses learnable quantization parameters paired with Lagrange multipliers, all trained through a two-stage progressive strategy guided by a multi-rate-distortion loss. The approach yields wide bitrate adaptability and significant RD gains over state-of-the-art fixed-rate methods, demonstrated on ReRF and DNA-Rendering with substantial BD-BR savings and faster rendering. This work advances practical delivery of photorealistic volumetric video by enabling flexible bitrate control without training multiple models.

Abstract

Neural Radiance Field (NeRF)-based volumetric video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences that provide audiences with unprecedented immersion and interactivity. However, the substantial data volumes pose significant challenges for storage and transmission. Existing solutions typically optimize NeRF representation and compression independently or focus on a single fixed rate-distortion (RD) tradeoff. In this paper, we propose VRVVC, a novel end-to-end joint optimization variable-rate framework for volumetric video compression that achieves variable bitrates using a single model while maintaining superior RD performance. Specifically, VRVVC introduces a compact tri-plane implicit residual representation for inter-frame modeling of long-duration dynamic scenes, effectively reducing temporal redundancy. We further propose a variable-rate residual representation compression scheme that leverages a learnable quantization and a tiny MLP-based entropy model. This approach enables variable bitrates through the utilization of predefined Lagrange multipliers to manage the quantization error of all latent representations. Finally, we present an end-to-end progressive training strategy combined with a multi-rate-distortion loss function to optimize the entire framework. Extensive experiments demonstrate that VRVVC achieves a wide range of variable bitrates within a single model and surpasses the RD performance of existing methods across various datasets.

VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression

TL;DR

VRVVC tackles the high data burden of NeRF-based volumetric video by proposing a single end-to-end model capable of variable-rate compression. It combines a compact tri-plane residual dynamic modeling of inter-frame content with a variable-rate entropy coding scheme that uses learnable quantization parameters paired with Lagrange multipliers, all trained through a two-stage progressive strategy guided by a multi-rate-distortion loss. The approach yields wide bitrate adaptability and significant RD gains over state-of-the-art fixed-rate methods, demonstrated on ReRF and DNA-Rendering with substantial BD-BR savings and faster rendering. This work advances practical delivery of photorealistic volumetric video by enabling flexible bitrate control without training multiple models.

Abstract

Neural Radiance Field (NeRF)-based volumetric video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences that provide audiences with unprecedented immersion and interactivity. However, the substantial data volumes pose significant challenges for storage and transmission. Existing solutions typically optimize NeRF representation and compression independently or focus on a single fixed rate-distortion (RD) tradeoff. In this paper, we propose VRVVC, a novel end-to-end joint optimization variable-rate framework for volumetric video compression that achieves variable bitrates using a single model while maintaining superior RD performance. Specifically, VRVVC introduces a compact tri-plane implicit residual representation for inter-frame modeling of long-duration dynamic scenes, effectively reducing temporal redundancy. We further propose a variable-rate residual representation compression scheme that leverages a learnable quantization and a tiny MLP-based entropy model. This approach enables variable bitrates through the utilization of predefined Lagrange multipliers to manage the quantization error of all latent representations. Finally, we present an end-to-end progressive training strategy combined with a multi-rate-distortion loss function to optimize the entire framework. Extensive experiments demonstrate that VRVVC achieves a wide range of variable bitrates within a single model and surpasses the RD performance of existing methods across various datasets.

Paper Structure

This paper contains 11 sections, 9 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Left: Our proposed VRVVC efficiently compresses volumetric video at variable bitrates using a single model. Middle: We demonstrate two examples of reconstruction quality at a bitrate of 60 KB per frame. Right: The RD performance of our approach surpasses prior work (e.g. ReRF rerf, TeTriRF tetrirf)
  • Figure 2: Illustration of our VRVVC framework. We employ a compact tri-plane residual representation for inter-frame modeling of long-duration dynamic scenes. The residuals are encoded into several bitstreams in an MLP-based entropy model that utilizes the RD tradeoff parameter $\lambda$ to achieve variable bitrates within a single model.
  • Figure 3: Overview of our progressive training. In the first stage, we adopt the reconstructed features $\hat{\mathbf{F}}_{t-1}$ from the previous frame, retrieved from the decoded buffer, to train the current frame's low-resolution residual features. In the second stage, these features are reused as an effective initialization for further training, where they are integrated with a variable-rate entropy coding model for joint optimization. The entire training process is supervised by the multi-rate-distortion loss $\mathcal{L}_s$.
  • Figure 4: Qualitative comparison against volumetric video coding methods K-planes kplanes, ReRF rerf, TeTrirf tetrirf and JointRF zheng2024jointrf.
  • Figure 5: The RD performance comparison results on the ReRF and DNA-Rendering datasets.
  • ...and 2 more figures