Table of Contents
Fetching ...

P-4DGS: Predictive 4D Gaussian Splatting with 90$\times$ Compression

Henan Wang, Hanxin Zhu, Xinliang Gong, Tianyu He, Xin Li, Zhibo Chen

TL;DR

P-4DGS tackles the storage bottleneck in 4D Gaussian Splatting by introducing a spatial-temporal prediction framework built on anchor-based covariant prediction and a deformation MLP for temporal dynamics. It further employs adaptive quantization and a context-aware entropy model to jointly optimize rate and distortion, achieving up to $40\times$–$90\times$ compression with minimal quality loss and the fastest rendering speeds among strong baselines. The approach demonstrates state-of-the-art RD performance on both synthetic (D-NeRF) and real-world (NeRF-DS) dynamic scenes, with storage around $1$ MB on average. The work highlights practical impact for scalable dynamic scene reconstruction and real-time rendering, while identifying the deformation MLP’s fixed size as a limitation for ultra-low bitrate regimes and suggesting future enhancements in temporal compression.

Abstract

3D Gaussian Splatting (3DGS) has garnered significant attention due to its superior scene representation fidelity and real-time rendering performance, especially for dynamic 3D scene reconstruction (\textit{i.e.}, 4D reconstruction). However, despite achieving promising results, most existing algorithms overlook the substantial temporal and spatial redundancies inherent in dynamic scenes, leading to prohibitive memory consumption. To address this, we propose P-4DGS, a novel dynamic 3DGS representation for compact 4D scene modeling. Inspired by intra- and inter-frame prediction techniques commonly used in video compression, we first design a 3D anchor point-based spatial-temporal prediction module to fully exploit the spatial-temporal correlations across different 3D Gaussian primitives. Subsequently, we employ an adaptive quantization strategy combined with context-based entropy coding to further reduce the size of the 3D anchor points, thereby achieving enhanced compression efficiency. To evaluate the rate-distortion performance of our proposed P-4DGS in comparison with other dynamic 3DGS representations, we conduct extensive experiments on both synthetic and real-world datasets. Experimental results demonstrate that our approach achieves state-of-the-art reconstruction quality and the fastest rendering speed, with a remarkably low storage footprint (around \textbf{1MB} on average), achieving up to \textbf{40$\times$} and \textbf{90$\times$} compression on synthetic and real-world scenes, respectively.

P-4DGS: Predictive 4D Gaussian Splatting with 90$\times$ Compression

TL;DR

P-4DGS tackles the storage bottleneck in 4D Gaussian Splatting by introducing a spatial-temporal prediction framework built on anchor-based covariant prediction and a deformation MLP for temporal dynamics. It further employs adaptive quantization and a context-aware entropy model to jointly optimize rate and distortion, achieving up to compression with minimal quality loss and the fastest rendering speeds among strong baselines. The approach demonstrates state-of-the-art RD performance on both synthetic (D-NeRF) and real-world (NeRF-DS) dynamic scenes, with storage around MB on average. The work highlights practical impact for scalable dynamic scene reconstruction and real-time rendering, while identifying the deformation MLP’s fixed size as a limitation for ultra-low bitrate regimes and suggesting future enhancements in temporal compression.

Abstract

3D Gaussian Splatting (3DGS) has garnered significant attention due to its superior scene representation fidelity and real-time rendering performance, especially for dynamic 3D scene reconstruction (\textit{i.e.}, 4D reconstruction). However, despite achieving promising results, most existing algorithms overlook the substantial temporal and spatial redundancies inherent in dynamic scenes, leading to prohibitive memory consumption. To address this, we propose P-4DGS, a novel dynamic 3DGS representation for compact 4D scene modeling. Inspired by intra- and inter-frame prediction techniques commonly used in video compression, we first design a 3D anchor point-based spatial-temporal prediction module to fully exploit the spatial-temporal correlations across different 3D Gaussian primitives. Subsequently, we employ an adaptive quantization strategy combined with context-based entropy coding to further reduce the size of the 3D anchor points, thereby achieving enhanced compression efficiency. To evaluate the rate-distortion performance of our proposed P-4DGS in comparison with other dynamic 3DGS representations, we conduct extensive experiments on both synthetic and real-world datasets. Experimental results demonstrate that our approach achieves state-of-the-art reconstruction quality and the fastest rendering speed, with a remarkably low storage footprint (around \textbf{1MB} on average), achieving up to \textbf{40} and \textbf{90} compression on synthetic and real-world scenes, respectively.

Paper Structure

This paper contains 34 sections, 17 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Rendering pipeline of P-4DGS. The pipeline first performs spatial prediction by mapping anchor points in the canonical space to static Gaussian primitives via an anchor prediction module. Then, temporal prediction is conducted using a deformation MLP that maps these primitives to a target time step $t$, producing dynamic Gaussian primitives for final image rendering.
  • Figure 2: Entropy coding pipeline of P-4DGS, consisting of context generation and anchor compression. In context generation, anchor positions $x_a$ query a hash grid to produce a feature $h$. which an MLP maps to quantization step $q$, mean $\mu$, and standard deviation $\sigma$. In anchor compression, attributes $s,l,f,O$ are adaptively quantized using $q$ and encoded into a bitstream via context-based entropy coding with $\mu,\sigma$.
  • Figure 3: Rate-distortion curves on D-NeRF and NeRF-DS datasets. The x-axis shows the bitrate (log scale) of compressed Gaussian representations, and the y-axes report average PSNR, SSIM, and LPIPS. Our method achieves high reconstruction quality across bitrates, outperforming D3DGS, 4DHexPlane, and 4DGS, with over 40$\times$ and 90$\times$ compression on D-NeRF and NeRF-DS, respectively.
  • Figure 4: Quantitative comparisons on D-NeRF and NeRF-DS datasets. Our method achieves high-fidelity rendering with significantly lower storage ( 1MB) compared to D3DGS, 4DHexPlane, and 4DGS. It faithfully reconstructs dynamic scenes with minimal artifacts, while baseline methods show visible degradation such as blur or loss of dynamic details.