Table of Contents
Fetching ...

A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction

Bin Zhang, Bi Zeng, Zexin Peng

TL;DR

Experimental results demonstrate that the proposed refined 3D Gaussian representation surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS, making it highly suitable for various tasks such as novel view synthesis, and dynamic mapping.

Abstract

In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. Building upon NeRF, 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. While this shift has notably elevated the rendering quality and speed of radiance fields but inevitably led to a significant increase in memory usage. Additionally, effectively rendering dynamic scenes in 3D-GS has emerged as a pressing challenge. To address these concerns, this paper purposes a refined 3D Gaussian representation for high-quality dynamic scene reconstruction. Firstly, we use a deformable multi-layer perceptron (MLP) network to capture the dynamic offset of Gaussian points and express the color features of points through hash encoding and a tiny MLP to reduce storage requirements. Subsequently, we introduce a learnable denoising mask coupled with denoising loss to eliminate noise points from the scene, thereby further compressing 3D Gaussian model. Finally, motion noise of points is mitigated through static constraints and motion consistency constraints. Experimental results demonstrate that our method surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS, making it highly suitable for various tasks such as novel view synthesis, and dynamic mapping.

A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction

TL;DR

Experimental results demonstrate that the proposed refined 3D Gaussian representation surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS, making it highly suitable for various tasks such as novel view synthesis, and dynamic mapping.

Abstract

In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. Building upon NeRF, 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. While this shift has notably elevated the rendering quality and speed of radiance fields but inevitably led to a significant increase in memory usage. Additionally, effectively rendering dynamic scenes in 3D-GS has emerged as a pressing challenge. To address these concerns, this paper purposes a refined 3D Gaussian representation for high-quality dynamic scene reconstruction. Firstly, we use a deformable multi-layer perceptron (MLP) network to capture the dynamic offset of Gaussian points and express the color features of points through hash encoding and a tiny MLP to reduce storage requirements. Subsequently, we introduce a learnable denoising mask coupled with denoising loss to eliminate noise points from the scene, thereby further compressing 3D Gaussian model. Finally, motion noise of points is mitigated through static constraints and motion consistency constraints. Experimental results demonstrate that our method surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS, making it highly suitable for various tasks such as novel view synthesis, and dynamic mapping.
Paper Structure (21 sections, 17 equations, 7 figures, 4 tables)

This paper contains 21 sections, 17 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Given a set of monocular multi-view images and camera poses (a), our approach not only facilitates the rendering of dynamic scenes (b) but also enables novel view time interpolation (c), leading to superior rendering quality compared to existing methods (d), alongside a notable reduction in memory usage.
  • Figure 2: Overview of our pipeline. The details of the overall framework are elaborated in Section \ref{['sec:overview']}. How deformation fields and hash encoding collaborate for dynamic representation is elaborated in Section \ref{['sec:deformable_3D_hash']}. The denoising mask is introduced in Section \ref{['sec:denoising_mask']}. Finally, in Section \ref{['sec:static_consistency']}, we provide a concrete implementation of static constraints and motion consistency constraints, demonstrating how they facilitate the network in better learning the dynamic offsets of points in the scene.
  • Figure 3: Qualitative comparisons of baselines and our method on NeRF-DS real-world dataset. Experimental results have indicated our ability to mitigate certain high-frequency errors through the utilization of hash coding and denoising masks.
  • Figure 4: Qualitative comparisons of baselines and our method on monocular synthetic dataset. The experiments demonstrate the efficacy of leveraging deformation fields and hash encoding to facilitate dynamic scene rendering with 3D Gaussians.
  • Figure 5: Qualitative comparison of static constraint.
  • ...and 2 more figures