Table of Contents
Fetching ...

Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting

Jinbo Yan, Rui Peng, Zhiyan Wang, Luyang Tang, Jiayu Yang, Jie Liang, Jiahao Wu, Ronggang Wang

TL;DR

IGS introduces a fast, generalizable streaming framework for dynamic scene reconstruction by coupling an Anchor-driven Gaussian Motion Network (AGM-Net) with a Key-frame-guided Streaming strategy. AGM-Net projects multi-view motion features to 3D anchor points to drive Gaussian primitives in a single forward pass, eliminating per-frame optimization. The key-frame streaming approach curbs error accumulation by refining key frames while generating candidate frames via AGM-Net, with a max-points bound to control growth. Across in-domain and cross-domain tests, IGS achieves around 2–3 seconds per-frame reconstruction with superior rendering quality and lower storage than prior streaming methods, enabling practical real-time streaming of dynamic scenes. The work demonstrates strong generalization and offers a scalable path toward interactive free-viewpoint rendering in live settings.

Abstract

Building Free-Viewpoint Videos in a streaming manner offers the advantage of rapid responsiveness compared to offline training methods, greatly enhancing user experience. However, current streaming approaches face challenges of high per-frame reconstruction time (10s+) and error accumulation, limiting their broader application. In this paper, we propose Instant Gaussian Stream (IGS), a fast and generalizable streaming framework, to address these issues. First, we introduce a generalized Anchor-driven Gaussian Motion Network, which projects multi-view 2D motion features into 3D space, using anchor points to drive the motion of all Gaussians. This generalized Network generates the motion of Gaussians for each target frame in the time required for a single inference. Second, we propose a Key-frame-guided Streaming Strategy that refines each key frame, enabling accurate reconstruction of temporally complex scenes while mitigating error accumulation. We conducted extensive in-domain and cross-domain evaluations, demonstrating that our approach can achieve streaming with a average per-frame reconstruction time of 2s+, alongside a enhancement in view synthesis quality.

Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting

TL;DR

IGS introduces a fast, generalizable streaming framework for dynamic scene reconstruction by coupling an Anchor-driven Gaussian Motion Network (AGM-Net) with a Key-frame-guided Streaming strategy. AGM-Net projects multi-view motion features to 3D anchor points to drive Gaussian primitives in a single forward pass, eliminating per-frame optimization. The key-frame streaming approach curbs error accumulation by refining key frames while generating candidate frames via AGM-Net, with a max-points bound to control growth. Across in-domain and cross-domain tests, IGS achieves around 2–3 seconds per-frame reconstruction with superior rendering quality and lower storage than prior streaming methods, enabling practical real-time streaming of dynamic scenes. The work demonstrates strong generalization and offers a scalable path toward interactive free-viewpoint rendering in live settings.

Abstract

Building Free-Viewpoint Videos in a streaming manner offers the advantage of rapid responsiveness compared to offline training methods, greatly enhancing user experience. However, current streaming approaches face challenges of high per-frame reconstruction time (10s+) and error accumulation, limiting their broader application. In this paper, we propose Instant Gaussian Stream (IGS), a fast and generalizable streaming framework, to address these issues. First, we introduce a generalized Anchor-driven Gaussian Motion Network, which projects multi-view 2D motion features into 3D space, using anchor points to drive the motion of all Gaussians. This generalized Network generates the motion of Gaussians for each target frame in the time required for a single inference. Second, we propose a Key-frame-guided Streaming Strategy that refines each key frame, enabling accurate reconstruction of temporally complex scenes while mitigating error accumulation. We conducted extensive in-domain and cross-domain evaluations, demonstrating that our approach can achieve streaming with a average per-frame reconstruction time of 2s+, alongside a enhancement in view synthesis quality.

Paper Structure

This paper contains 33 sections, 10 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Performance comparison with pervious SOTAsun20243dgstreamli2023spacetimeWu_2024_CVPRli2022streamingkplanes. Our method achieves a per-frame reconstruction time of 2.67s, delivering high-quality rendering results in a streaming fashion (a)(b), with a noticeable improvement in performance (c). * denotes a streamable method.
  • Figure 2: The overall pipeline of IGS. (a) Starting from the key frame and moving towards the target frame, we extract the 2D Motion Feature Map. (b) Then we sample M anchor points from the Gaussian primitives of the key frame, (c) and the anchor points are projected onto these feature maps to obtain 3D motion features through Projection-aware Motion Feature Lift. (d) Each Gaussian point interpolates its own motion feature from neighboring anchors and applies a weighted aggregation of features, which is then decoded into the motion of the Gaussian between the key frame and the target frame. (e) The entire streaming reconstruction process is guided by the Key-frame-guided Streaming strategy, where the key frame directly infers subsequent candidate frames until the next key-frame is reached, at which point max-point bounded refinement is applied to the key-frame.
  • Figure 3: The PSNR trend comparison on the sear steak .
  • Figure 4: Qualitative comparison from the Meeting Room dataset.
  • Figure 5: Qualitative comparison from the N3DV dataset.
  • ...and 5 more figures