Table of Contents
Fetching ...

DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering

Jiahao Lu, Jiacheng Deng, Ruijie Zhu, Yanzhe Liang, Wenfei Yang, Tianzhu Zhang, Xu Zhou

TL;DR

A Noise Suppression Strategy is introduced to change the distribution of the coordinates of the canonical 3D gaussians and suppress noise, and a Decoupled Temporal-Spatial Aggregation Module is designed to aggregate information from adjacent points and frames.

Abstract

Dynamic scenes rendering is an intriguing yet challenging problem. Although current methods based on NeRF have achieved satisfactory performance, they still can not reach real-time levels. Recently, 3D Gaussian Splatting (3DGS) has garnered researchers attention due to their outstanding rendering quality and real-time speed. Therefore, a new paradigm has been proposed: defining a canonical 3D gaussians and deforming it to individual frames in deformable fields. However, since the coordinates of canonical 3D gaussians are filled with noise, which can transfer noise into the deformable fields, and there is currently no method that adequately considers the aggregation of 4D information. Therefore, we propose Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering (DN-4DGS). Specifically, a Noise Suppression Strategy is introduced to change the distribution of the coordinates of the canonical 3D gaussians and suppress noise. Additionally, a Decoupled Temporal-Spatial Aggregation Module is designed to aggregate information from adjacent points and frames. Extensive experiments on various real-world datasets demonstrate that our method achieves state-of-the-art rendering quality under a real-time level.

DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering

TL;DR

A Noise Suppression Strategy is introduced to change the distribution of the coordinates of the canonical 3D gaussians and suppress noise, and a Decoupled Temporal-Spatial Aggregation Module is designed to aggregate information from adjacent points and frames.

Abstract

Dynamic scenes rendering is an intriguing yet challenging problem. Although current methods based on NeRF have achieved satisfactory performance, they still can not reach real-time levels. Recently, 3D Gaussian Splatting (3DGS) has garnered researchers attention due to their outstanding rendering quality and real-time speed. Therefore, a new paradigm has been proposed: defining a canonical 3D gaussians and deforming it to individual frames in deformable fields. However, since the coordinates of canonical 3D gaussians are filled with noise, which can transfer noise into the deformable fields, and there is currently no method that adequately considers the aggregation of 4D information. Therefore, we propose Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering (DN-4DGS). Specifically, a Noise Suppression Strategy is introduced to change the distribution of the coordinates of the canonical 3D gaussians and suppress noise. Additionally, a Decoupled Temporal-Spatial Aggregation Module is designed to aggregate information from adjacent points and frames. Extensive experiments on various real-world datasets demonstrate that our method achieves state-of-the-art rendering quality under a real-time level.

Paper Structure

This paper contains 25 sections, 8 equations, 16 figures, 9 tables.

Figures (16)

  • Figure 1: (a) The visualization results on PlenopticVideo li2022neural dataset. (b) The visualization results on HyperNeRF park2021hypernerf dataset. The numbers below the images represent PSNR.
  • Figure 2: Comparison of our render visualization with 4DGaussian wu20234d. The results are rendered on HyperNeRF park2021hypernerf dataset and use the point cloud provided by HyperNeRF for Gaussian initialization ($Sparse$$Init$). Image 1: canonical 3D gaussians generated by 4DGaussian. Image 2: deformable 3D gaussians generated by 4DGaussian. Image 3: canonical 3D gaussians generated by our method. Image 4: deformable 3D gaussians after the first stage. Image 5: deformable 3D gaussians after the second stage. Image 6: ground truth. The yellow box emphasizes that through a two-stage deformation process, our method can produce higher-quality rendering results.
  • Figure 3: The overall framework of our method DN-4DGS. Our approach employs a two-stage deformation process. In the first deformation, the well-designed Temporal Aggregation Module is utilized to aggregate temporal information. After the first deformation, the coordinate distribution of 3D gaussians is altered, and noise is suppressed. Subsequently, we proceed with the second deformation, utilizing the Denoised Spatial Aggregation Module to aggregate spatial information.
  • Figure 4: The structure of aggregation operation.
  • Figure 5: More rendering images of canonical 3D gaussians. Here, $Sparse$$Init$ refers to using the point cloud provided by the HyperNeRF park2021hypernerf dataset ($\mathrm{COLMAP_{SFM}}$schonberger2016structure) for Gaussian initialization, while $Dense$$Init$ denotes generating a denser point cloud via $\mathrm{COLMAP_{MVS}}$schonberger2016structure. In fact, $Dense$$Init$ can produce better rendering quality, but due to the need for regenerating, it consumes more computational resources.
  • ...and 11 more figures