Table of Contents
Fetching ...

Temporal Smoothness-Aware Rate-Distortion Optimized 4D Gaussian Splatting

Hyeongmin Lee, Kyungjune Baek

TL;DR

The paper tackles the heavy storage burden of dynamic 4D Gaussian Splatting (4DGS) by introducing an end-to-end rate-distortion (RD) optimized compression framework that builds on the Ex4DGS baseline. It leverages a Haar wavelet transform to compress dynamic point trajectories and employs mask-based parameter pruning along with entropy-constrained vector quantization, integrated into a unified RD objective: $\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{dist}} + \lambda_{\text{R}} \mathcal{L}_{\text{rate}} + \lambda_{\text{reg}} \mathcal{L}_{\text{reg}}$, where $\mathcal{L}_{\text{rate}} = \lambda_{\text{GSprune}}\mathcal{L}_{\text{GSprune}} + \lambda_{\text{SHprune}}\mathcal{L}_{\text{SHprune}} + \mathcal{L}_{\text{entropy}} + \mathcal{L}_{\text{VQ}}$. The approach yields significant compression (up to 91× in some cases) while maintaining reasonable rendering fidelity and enabling flexible rate–distortion trade-offs suitable for edge devices and high-performance systems. Empirical results on N3V and Technicolor demonstrate substantial RD improvements, with ablative analyses guiding parameter choices (e.g., avoiding aggressive variance quantization) and comparisons showing favorable speed and size versus concurrent methods like Light4GS. The work advances practical volumetric video deployment by making dynamic Gaussian representations more compact and transfer-friendly, paving the way for real-time rendering on diverse hardware. Extensions to improve high-fidelity performance and further compress dynamic components are identified as future directions.

Abstract

Dynamic 4D Gaussian Splatting (4DGS) effectively extends the high-speed rendering capabilities of 3D Gaussian Splatting (3DGS) to represent volumetric videos. However, the large number of Gaussians, substantial temporal redundancies, and especially the absence of an entropy-aware compression framework result in large storage requirements. Consequently, this poses significant challenges for practical deployment, efficient edge-device processing, and data transmission. In this paper, we introduce a novel end-to-end RD-optimized compression framework tailored for 4DGS, aiming to enable flexible, high-fidelity rendering across varied computational platforms. Leveraging Fully Explicit Dynamic Gaussian Splatting (Ex4DGS), one of the state-of-the-art 4DGS methods, as our baseline, we start from the existing 3DGS compression methods for compatibility while effectively addressing additional challenges introduced by the temporal axis. In particular, instead of storing motion trajectories independently per point, we employ a wavelet transform to reflect the real-world smoothness prior, significantly enhancing storage efficiency. This approach yields significantly improved compression ratios and provides a user-controlled balance between compression efficiency and rendering quality. Extensive experiments demonstrate the effectiveness of our method, achieving up to 91$\times$ compression compared to the original Ex4DGS model while maintaining high visual fidelity. These results highlight the applicability of our framework for real-time dynamic scene rendering in diverse scenarios, from resource-constrained edge devices to high-performance environments. The source code is available at https://github.com/HyeongminLEE/RD4DGS.

Temporal Smoothness-Aware Rate-Distortion Optimized 4D Gaussian Splatting

TL;DR

The paper tackles the heavy storage burden of dynamic 4D Gaussian Splatting (4DGS) by introducing an end-to-end rate-distortion (RD) optimized compression framework that builds on the Ex4DGS baseline. It leverages a Haar wavelet transform to compress dynamic point trajectories and employs mask-based parameter pruning along with entropy-constrained vector quantization, integrated into a unified RD objective: , where . The approach yields significant compression (up to 91× in some cases) while maintaining reasonable rendering fidelity and enabling flexible rate–distortion trade-offs suitable for edge devices and high-performance systems. Empirical results on N3V and Technicolor demonstrate substantial RD improvements, with ablative analyses guiding parameter choices (e.g., avoiding aggressive variance quantization) and comparisons showing favorable speed and size versus concurrent methods like Light4GS. The work advances practical volumetric video deployment by making dynamic Gaussian representations more compact and transfer-friendly, paving the way for real-time rendering on diverse hardware. Extensions to improve high-fidelity performance and further compress dynamic components are identified as future directions.

Abstract

Dynamic 4D Gaussian Splatting (4DGS) effectively extends the high-speed rendering capabilities of 3D Gaussian Splatting (3DGS) to represent volumetric videos. However, the large number of Gaussians, substantial temporal redundancies, and especially the absence of an entropy-aware compression framework result in large storage requirements. Consequently, this poses significant challenges for practical deployment, efficient edge-device processing, and data transmission. In this paper, we introduce a novel end-to-end RD-optimized compression framework tailored for 4DGS, aiming to enable flexible, high-fidelity rendering across varied computational platforms. Leveraging Fully Explicit Dynamic Gaussian Splatting (Ex4DGS), one of the state-of-the-art 4DGS methods, as our baseline, we start from the existing 3DGS compression methods for compatibility while effectively addressing additional challenges introduced by the temporal axis. In particular, instead of storing motion trajectories independently per point, we employ a wavelet transform to reflect the real-world smoothness prior, significantly enhancing storage efficiency. This approach yields significantly improved compression ratios and provides a user-controlled balance between compression efficiency and rendering quality. Extensive experiments demonstrate the effectiveness of our method, achieving up to 91 compression compared to the original Ex4DGS model while maintaining high visual fidelity. These results highlight the applicability of our framework for real-time dynamic scene rendering in diverse scenarios, from resource-constrained edge devices to high-performance environments. The source code is available at https://github.com/HyeongminLEE/RD4DGS.

Paper Structure

This paper contains 24 sections, 13 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Average RD performance comparison between our method (Levels 1-6) and the Ex4DGS ex4dgs on the Neural 3D Video (N3V) dataset. Per-scene details are provided in \ref{['appendix:quantitative']}.
  • Figure 2: Average RD performance comparison between our method (Levels 1-6) and the Ex4DGS ex4dgs on the Technicolor dataset. Per-scene details are provided in \ref{['appendix:quantitative']}.
  • Figure 3: Qualitative results on the Technicolor dataset. We compare Ground Truth, Ex4DGS ex4dgs, and ours at compression levels 6 and 1. PSNR (dB) / Size (MB) are shown below each image.
  • Figure 4: Qualitative results on the N3V dataset. We compare Ground Truth, Ex4DGS ex4dgs, and ours at compression levels 6 and 1. PSNR (dB) / Size (MB) are shown below each image.
  • Figure 5: Per-scene RD performance comparison between our method (Levels 1-6) and the Ex4DGS ex4dgs on the Neural 3D Video (N3V) dataset.
  • ...and 4 more figures