Table of Contents
Fetching ...

A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis

Kai Katsumata, Duc Minh Vo, Hideki Nakayama

TL;DR

This work tackles real-time dynamic view synthesis by extending 3D Gaussian Splatting with a compact, time-parameterized Gaussian representation. It models Gaussian centers with a Fourier basis and rotations with a linear quaternion, while keeping scale, color, and opacity time-invariant, yielding memory efficiency of $O(LN)$ and enabling rendering at 118 FPS at high resolution on a single GPU. A flow-based supervision strategy aligns scene flow to input videos, and a two-stage optimization (static priors followed by dynamic refinement) plus divide-and-prune strategies deliver robust reconstruction. Across D-NeRF, DyNeRF, and HyperNeRF data, the approach achieves competitive visual quality with superior rendering speed and enables easy editing of dynamic scenes due to the explicit Gaussian representation.

Abstract

3D Gaussian Splatting (3DGS) has shown remarkable success in synthesizing novel views given multiple views of a static scene. Yet, 3DGS faces challenges when applied to dynamic scenes because 3D Gaussian parameters need to be updated per timestep, requiring a large amount of memory and at least a dozen observations per timestep. To address these limitations, we present a compact dynamic 3D Gaussian representation that models positions and rotations as functions of time with a few parameter approximations while keeping other properties of 3DGS including scale, color and opacity invariant. Our method can dramatically reduce memory usage and relax a strict multi-view assumption. In our experiments on monocular and multi-view scenarios, we show that our method not only matches state-of-the-art methods, often linked with slower rendering speeds, in terms of high rendering quality but also significantly surpasses them by achieving a rendering speed of $118$ frames per second (FPS) at a resolution of 1,352$\times$1,014 on a single GPU.

A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis

TL;DR

This work tackles real-time dynamic view synthesis by extending 3D Gaussian Splatting with a compact, time-parameterized Gaussian representation. It models Gaussian centers with a Fourier basis and rotations with a linear quaternion, while keeping scale, color, and opacity time-invariant, yielding memory efficiency of and enabling rendering at 118 FPS at high resolution on a single GPU. A flow-based supervision strategy aligns scene flow to input videos, and a two-stage optimization (static priors followed by dynamic refinement) plus divide-and-prune strategies deliver robust reconstruction. Across D-NeRF, DyNeRF, and HyperNeRF data, the approach achieves competitive visual quality with superior rendering speed and enables easy editing of dynamic scenes due to the explicit Gaussian representation.

Abstract

3D Gaussian Splatting (3DGS) has shown remarkable success in synthesizing novel views given multiple views of a static scene. Yet, 3DGS faces challenges when applied to dynamic scenes because 3D Gaussian parameters need to be updated per timestep, requiring a large amount of memory and at least a dozen observations per timestep. To address these limitations, we present a compact dynamic 3D Gaussian representation that models positions and rotations as functions of time with a few parameter approximations while keeping other properties of 3DGS including scale, color and opacity invariant. Our method can dramatically reduce memory usage and relax a strict multi-view assumption. In our experiments on monocular and multi-view scenarios, we show that our method not only matches state-of-the-art methods, often linked with slower rendering speeds, in terms of high rendering quality but also significantly surpasses them by achieving a rendering speed of frames per second (FPS) at a resolution of 1,3521,014 on a single GPU.
Paper Structure (15 sections, 9 equations, 7 figures, 4 tables)

This paper contains 15 sections, 9 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: We show examples of novel view synthesis on the Mutant scene in the D-NeRF dataset, visual quality (PSNR), rendering speed (FPS), and memory used to store optimized parameters. Our method yields reconstruction fidelity competitive with SoTAs with real-time rendering, achieving $100\times$ faster than V4D and reasonable memory size. Non-obvious differences in quality are highlighted. $\textbf{Bold}$ typeface number indicates the best result among the methods with the competitive rendering quality (excepting for 3DGS), and the $\underline{\textrm{underline}}$ one does the second best.
  • Figure 2: Overview of our dynamic view synthesis framework. Our dynamic 3D Gaussian representation takes temporal modeling of 3D centers and rotations with Fourier and Linear approximation, respectively. Our representation parameters are shared over all the timesteps, and observations of each timestep hint at the representation for other timesteps, enabling compact representation and reconstruction of dynamic scenes from few-view videos. In this figure, we only illustrate the time-varying parameterization of one Gaussian for the sake of simplicity.
  • Figure 3: Qualitative comparison on D-NeRF pumarola2021d. We highlight the difference by zoom view. Our method achieves competitive visual quality with strong baselines. While our method successfully reconstructs intricate details like hands, it causes a blurred sphere shape.
  • Figure 4: Qualitative comparison on the DyNeRF dataset li2022neural. The differences are zoomed in.
  • Figure 5: Qualitative comparison on HyperNeRF park2021hypernerf. Our method offers sharp results.
  • ...and 2 more figures