Table of Contents
Fetching ...

4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video

Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, XiaoyunZhang, Guangtao Zhai, Yanfeng Wang

TL;DR

4DGC addresses the challenge of efficiently streaming photorealistic Free-Viewpoint Video by proposing a rate-aware compression framework for 4D Gaussian representations. It combines a motion-aware dynamic Gaussian model (motion grid plus compensated Gaussians) with an end-to-end, differentiable compression scheme that jointly optimizes the representation and a tiny entropy model under a rate-distortion objective. The method achieves state-of-the-art RD performance, supports variable bitrates, and demonstrates substantial bitrate reductions (e.g., up to ~16x over prior work) while preserving rendering fidelity and speed. These results imply significant storage and bandwidth savings for open-sc-domain FVV in AR/VR contexts, with practical gains in both training efficiency and real-time rendering.

Abstract

3D Gaussian Splatting (3DGS) has substantial potential for enabling photorealistic Free-Viewpoint Video (FVV) experiences. However, the vast number of Gaussians and their associated attributes poses significant challenges for storage and transmission. Existing methods typically handle dynamic 3DGS representation and compression separately, neglecting motion information and the rate-distortion (RD) trade-off during training, leading to performance degradation and increased model redundancy. To address this gap, we propose 4DGC, a novel rate-aware 4D Gaussian compression framework that significantly reduces storage size while maintaining superior RD performance for FVV. Specifically, 4DGC introduces a motion-aware dynamic Gaussian representation that utilizes a compact motion grid combined with sparse compensated Gaussians to exploit inter-frame similarities. This representation effectively handles large motions, preserving quality and reducing temporal redundancy. Furthermore, we present an end-to-end compression scheme that employs differentiable quantization and a tiny implicit entropy model to compress the motion grid and compensated Gaussians efficiently. The entire framework is jointly optimized using a rate-distortion trade-off. Extensive experiments demonstrate that 4DGC supports variable bitrates and consistently outperforms existing methods in RD performance across multiple datasets.

4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video

TL;DR

4DGC addresses the challenge of efficiently streaming photorealistic Free-Viewpoint Video by proposing a rate-aware compression framework for 4D Gaussian representations. It combines a motion-aware dynamic Gaussian model (motion grid plus compensated Gaussians) with an end-to-end, differentiable compression scheme that jointly optimizes the representation and a tiny entropy model under a rate-distortion objective. The method achieves state-of-the-art RD performance, supports variable bitrates, and demonstrates substantial bitrate reductions (e.g., up to ~16x over prior work) while preserving rendering fidelity and speed. These results imply significant storage and bandwidth savings for open-sc-domain FVV in AR/VR contexts, with practical gains in both training efficiency and real-time rendering.

Abstract

3D Gaussian Splatting (3DGS) has substantial potential for enabling photorealistic Free-Viewpoint Video (FVV) experiences. However, the vast number of Gaussians and their associated attributes poses significant challenges for storage and transmission. Existing methods typically handle dynamic 3DGS representation and compression separately, neglecting motion information and the rate-distortion (RD) trade-off during training, leading to performance degradation and increased model redundancy. To address this gap, we propose 4DGC, a novel rate-aware 4D Gaussian compression framework that significantly reduces storage size while maintaining superior RD performance for FVV. Specifically, 4DGC introduces a motion-aware dynamic Gaussian representation that utilizes a compact motion grid combined with sparse compensated Gaussians to exploit inter-frame similarities. This representation effectively handles large motions, preserving quality and reducing temporal redundancy. Furthermore, we present an end-to-end compression scheme that employs differentiable quantization and a tiny implicit entropy model to compress the motion grid and compensated Gaussians efficiently. The entire framework is jointly optimized using a rate-distortion trade-off. Extensive experiments demonstrate that 4DGC supports variable bitrates and consistently outperforms existing methods in RD performance across multiple datasets.

Paper Structure

This paper contains 18 sections, 11 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Left: 4DGC results, showcasing flexible quality levels across various bitrates. Middle: Comparison of visual quality and bitrate with state-of-the-art methods. Right: The RD performance of our approach surpasses that of prior work (e.g. 3DGStream sun20243dgstream, ReRF rerf, TeTriRF tetrirf).
  • Figure 2: Illustration of the 4DGC Framework. The reconstructed Gaussians from the previous frame, $\hat{\mathbf{G}}_{t-1}$, are retrieved from the reference buffer and combined with the input images of the current frame to facilitate learning of the motion grid $\mathbf{M}_t$ and the compensated Gaussians $\Delta \mathbf{G}_t$ through a two-stage training process. In the first stage, the motion grid and its associated entropy model are optimized. In the second stage, the compensated Gaussians are refined along with their corresponding entropy model. Both stages are supervised by a rate-distortion trade-off, employing simulated quantization and an entropy model to jointly optimize representation and compression.
  • Figure 3: Illustration of our motion-aware dynamic Gaussian modeling that utilizes a multi-resolution motion grid $\mathbf{M}_t$ with sparse compensated Gaussians $\Delta \mathbf{G}_t$ to exploit inter-frame similarities.
  • Figure 4: Rate-distortion curves across different datasets. Rate-distortion curves not only illustrate the superiority of our method over ReRF rerf, TeTriRF tetrirf, and 3DGStream sun20243dgstream, but also demonstrate the efficiency of various components within our method.
  • Figure 5: Qualitative comparison on the N3DV li2022neural and MeetRoom streaming datasets against ReRF rerf, TeTriRF tetrirf, and 3DGStream sun20243dgstream.
  • ...and 3 more figures