Table of Contents
Fetching ...

4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting

Chun-Tin Wu, Jun-Cheng Chen

TL;DR

The paper tackles the memory and compute bottlenecks of dynamic scene rendering by introducing 4D-NVS, which decouples spatial structure from temporal dynamics using a persistent voxel grid that generates on-demand neural Gaussians. Temporal evolution is modeled with a unified HexPlane 4D representation, while a selective deformation strategy updates only geometric Gaussian properties, preserving appearance. A three-stage training pipeline—coarse initialization, fine temporal training, and view refinement—coupled with an adaptive, view-aware refinement stage yields superior image quality with significantly reduced memory and faster training compared to state-of-the-art methods, enabling real-time rendering on consumer GPUs. The approach demonstrates strong quantitative results and robust ablations, suggesting practical impact for real-time VR/AR, embodied AI, and interactive digital content, with ongoing work to mitigate large-motion artifacts and popping effects through improved motion modeling and voxel rasterization.

Abstract

Although 3D Gaussian Splatting (3D-GS) achieves efficient rendering for novel view synthesis, extending it to dynamic scenes still results in substantial memory overhead from replicating Gaussians across frames. To address this challenge, we propose 4D Neural Voxel Splatting (4D-NVS), which combines voxel-based representations with neural Gaussian splatting for efficient dynamic scene modeling. Instead of generating separate Gaussian sets per timestamp, our method employs a compact set of neural voxels with learned deformation fields to model temporal dynamics. The design greatly reduces memory consumption and accelerates training while preserving high image quality. We further introduce a novel view refinement stage that selectively improves challenging viewpoints through targeted optimization, maintaining global efficiency while enhancing rendering quality for difficult viewing angles. Experiments demonstrate that our method outperforms state-of-the-art approaches with significant memory reduction and faster training, enabling real-time rendering with superior visual fidelity.

4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting

TL;DR

The paper tackles the memory and compute bottlenecks of dynamic scene rendering by introducing 4D-NVS, which decouples spatial structure from temporal dynamics using a persistent voxel grid that generates on-demand neural Gaussians. Temporal evolution is modeled with a unified HexPlane 4D representation, while a selective deformation strategy updates only geometric Gaussian properties, preserving appearance. A three-stage training pipeline—coarse initialization, fine temporal training, and view refinement—coupled with an adaptive, view-aware refinement stage yields superior image quality with significantly reduced memory and faster training compared to state-of-the-art methods, enabling real-time rendering on consumer GPUs. The approach demonstrates strong quantitative results and robust ablations, suggesting practical impact for real-time VR/AR, embodied AI, and interactive digital content, with ongoing work to mitigate large-motion artifacts and popping effects through improved motion modeling and voxel rasterization.

Abstract

Although 3D Gaussian Splatting (3D-GS) achieves efficient rendering for novel view synthesis, extending it to dynamic scenes still results in substantial memory overhead from replicating Gaussians across frames. To address this challenge, we propose 4D Neural Voxel Splatting (4D-NVS), which combines voxel-based representations with neural Gaussian splatting for efficient dynamic scene modeling. Instead of generating separate Gaussian sets per timestamp, our method employs a compact set of neural voxels with learned deformation fields to model temporal dynamics. The design greatly reduces memory consumption and accelerates training while preserving high image quality. We further introduce a novel view refinement stage that selectively improves challenging viewpoints through targeted optimization, maintaining global efficiency while enhancing rendering quality for difficult viewing angles. Experiments demonstrate that our method outperforms state-of-the-art approaches with significant memory reduction and faster training, enabling real-time rendering with superior visual fidelity.

Paper Structure

This paper contains 35 sections, 12 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Our approach demonstrates remarkable memory efficiency and training speed, while achieving superior image quality
  • Figure 2: Pipeline overview: (1) Initialize with voxel-based Gaussian splatting, (2) Generate neural Gaussians with temporal information, (3) Apply HexPlane temporal corrections, (4) Optimize with color loss, total variation loss, and scaling regularization, (5) View refinement stage for underperforming viewpoints through adaptive densification.
  • Figure 3: Visual comparisons of the proposed method on the HyperNeRF dataset with other methods. The proposed method achieves better rendering results.
  • Figure 4: Visualization of the Neu3D dataset compared with other methods. From the visual illustration shown in the top and bottom left, the proposed method strikes a balance while the others either perform worse on the hand or the spinach in the pan. More rendering videos can be found in the supplementary materials.
  • Figure 5: Continuous Frames on HyperNeRF Dataset compared with 4DGS.Top: Ours, Bottom: 4DGS. The proposed method deliver a better rendering results with more details than 4DGS, which can be seen in the top right.
  • ...and 2 more figures