4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting
Chun-Tin Wu, Jun-Cheng Chen
TL;DR
The paper tackles the memory and compute bottlenecks of dynamic scene rendering by introducing 4D-NVS, which decouples spatial structure from temporal dynamics using a persistent voxel grid that generates on-demand neural Gaussians. Temporal evolution is modeled with a unified HexPlane 4D representation, while a selective deformation strategy updates only geometric Gaussian properties, preserving appearance. A three-stage training pipeline—coarse initialization, fine temporal training, and view refinement—coupled with an adaptive, view-aware refinement stage yields superior image quality with significantly reduced memory and faster training compared to state-of-the-art methods, enabling real-time rendering on consumer GPUs. The approach demonstrates strong quantitative results and robust ablations, suggesting practical impact for real-time VR/AR, embodied AI, and interactive digital content, with ongoing work to mitigate large-motion artifacts and popping effects through improved motion modeling and voxel rasterization.
Abstract
Although 3D Gaussian Splatting (3D-GS) achieves efficient rendering for novel view synthesis, extending it to dynamic scenes still results in substantial memory overhead from replicating Gaussians across frames. To address this challenge, we propose 4D Neural Voxel Splatting (4D-NVS), which combines voxel-based representations with neural Gaussian splatting for efficient dynamic scene modeling. Instead of generating separate Gaussian sets per timestamp, our method employs a compact set of neural voxels with learned deformation fields to model temporal dynamics. The design greatly reduces memory consumption and accelerates training while preserving high image quality. We further introduce a novel view refinement stage that selectively improves challenging viewpoints through targeted optimization, maintaining global efficiency while enhancing rendering quality for difficult viewing angles. Experiments demonstrate that our method outperforms state-of-the-art approaches with significant memory reduction and faster training, enabling real-time rendering with superior visual fidelity.
