Table of Contents
Fetching ...

MGStream: Motion-aware 3D Gaussian for Streamable Dynamic Scene Reconstruction

Zhenyu Bao, Qing Li, Guibiao Liao, Zhongyuan Zhao, Kanglin Liu

TL;DR

MGStream tackles flickering and storage inefficiency in streamable dynamic scene reconstruction by isolating motion-related 3D Gaussian Splatting primitives for dynamics and keeping static primitives unchanged. It identifies motion-related 3DGs using a motion mask derived from optical flow and temporal difference, then maps these Gaussians to the motion mask with Gaussian ID maps (GIM) and a clustering-based convex hull to capture inside-object dynamics; deformation and attention-based optimization are applied only to these Gaussians, enabling emerging-object reconstruction with temporal consistency. This approach reduces both flicker and storage overhead while maintaining rendering fidelity. Experimental results on real-world datasets N3DV and MeetRoom demonstrate superior rendering quality, faster training/rendering, and improved temporal stability over state-of-the-art streaming 3DGS methods, with competitive offline performance.

Abstract

3D Gaussian Splatting (3DGS) has gained significant attention in streamable dynamic novel view synthesis (DNVS) for its photorealistic rendering capability and computational efficiency. Despite much progress in improving rendering quality and optimization strategies, 3DGS-based streamable dynamic scene reconstruction still suffers from flickering artifacts and storage inefficiency, and struggles to model the emerging objects. To tackle this, we introduce MGStream which employs the motion-related 3D Gaussians (3DGs) to reconstruct the dynamic and the vanilla 3DGs for the static. The motion-related 3DGs are implemented according to the motion mask and the clustering-based convex hull algorithm. The rigid deformation is applied to the motion-related 3DGs for modeling the dynamic, and the attention-based optimization on the motion-related 3DGs enables the reconstruction of the emerging objects. As the deformation and optimization are only conducted on the motion-related 3DGs, MGStream avoids flickering artifacts and improves the storage efficiency. Extensive experiments on real-world datasets N3DV and MeetRoom demonstrate that MGStream surpasses existing streaming 3DGS-based approaches in terms of rendering quality, training/storage efficiency and temporal consistency. Our code is available at: https://github.com/pcl3dv/MGStream.

MGStream: Motion-aware 3D Gaussian for Streamable Dynamic Scene Reconstruction

TL;DR

MGStream tackles flickering and storage inefficiency in streamable dynamic scene reconstruction by isolating motion-related 3D Gaussian Splatting primitives for dynamics and keeping static primitives unchanged. It identifies motion-related 3DGs using a motion mask derived from optical flow and temporal difference, then maps these Gaussians to the motion mask with Gaussian ID maps (GIM) and a clustering-based convex hull to capture inside-object dynamics; deformation and attention-based optimization are applied only to these Gaussians, enabling emerging-object reconstruction with temporal consistency. This approach reduces both flicker and storage overhead while maintaining rendering fidelity. Experimental results on real-world datasets N3DV and MeetRoom demonstrate superior rendering quality, faster training/rendering, and improved temporal stability over state-of-the-art streaming 3DGS methods, with competitive offline performance.

Abstract

3D Gaussian Splatting (3DGS) has gained significant attention in streamable dynamic novel view synthesis (DNVS) for its photorealistic rendering capability and computational efficiency. Despite much progress in improving rendering quality and optimization strategies, 3DGS-based streamable dynamic scene reconstruction still suffers from flickering artifacts and storage inefficiency, and struggles to model the emerging objects. To tackle this, we introduce MGStream which employs the motion-related 3D Gaussians (3DGs) to reconstruct the dynamic and the vanilla 3DGs for the static. The motion-related 3DGs are implemented according to the motion mask and the clustering-based convex hull algorithm. The rigid deformation is applied to the motion-related 3DGs for modeling the dynamic, and the attention-based optimization on the motion-related 3DGs enables the reconstruction of the emerging objects. As the deformation and optimization are only conducted on the motion-related 3DGs, MGStream avoids flickering artifacts and improves the storage efficiency. Extensive experiments on real-world datasets N3DV and MeetRoom demonstrate that MGStream surpasses existing streaming 3DGS-based approaches in terms of rendering quality, training/storage efficiency and temporal consistency. Our code is available at: https://github.com/pcl3dv/MGStream.

Paper Structure

This paper contains 13 sections, 12 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: MGStream aims to conduct dynamic novel view synthesis from multiple videos using a per-frame training paradigm. By employing the motion-related 3D Gaussians (3DGs) for modeling the dynamic and the vanilla 3DGs for the static, MGStream achieves high-quality rendering with storage efficiency, and avoids the flickering artifacts.
  • Figure 2: Overview of the proposed MGStream. A) shows the MGStream pipeline. Firstly, MGStream initiates the 3DGs with multi-view inputs at timestep 0. For subsequent frames, MGStream utilizes the 3DGs in the previous timestep as initialization. Then, MGStream locates the motion-related 3DGs, which are deformed and optimized for modeling the dynamic. B) elaborates the way for finding the motion-related 3DGs. MGStream employs the optical flow and temporal difference of adjacent frames to determine the motion mask, and establish the correspondence between the motion mask and the motion-related 3DGs via the GIM and clustering-based convex hull algorithm. C) shows the deformation and optimization on the motion-related 3DGs. All the motion-related 3DGs are deformed for modeling the dynamic, and the attention map is used to find those responsible for the emerging objects. Eventually, optimization is applied on those for reconstructing the new objects.
  • Figure 3: Illustration of Motion Mask. (a) and (b) represent images in frame $t-1$ and $t$ from identical camera view, respectively, and (c) is the difference between (a) and (b), i.e., $|I_t-I_{t-1}|$. (d), (e) and (f) are the optical flow mask, temporal difference mask, and motion mask, respectively.
  • Figure 4: Illustration of the clustering-based convex hull algorithm. (a) is the image from frame $t-1$. (b) and (c) are the rendered images without and with the convex hull algorithm, respectively. (d), (e), (f) shows the motion-related 3DGs. Specifically, (d) highlights the motion-related 3DGs via back-projection $G_o$, (e) is the 3DGs $G_i$ inside the convex structure, and (f) is the motio-related 3DGs $G_m$. It is clearly seen that ignoring $G_i$ would lead to the artifacts in (b).
  • Figure 5: Illustration of Deformation and Optimization.
  • ...and 1 more figures