Table of Contents
Fetching ...

StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

Zike Wu, Qi Yan, Xuanyu Yi, Lele Wang, Renjie Liao

TL;DR

This work introduces StreamSplat, a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner and supports the online reconstruction of arbitrarily long video streams with a 1200x speedup over optimization-based methods.

Abstract

Real-time reconstruction of dynamic 3D scenes from uncalibrated video streams demands robust online methods that recover scene dynamics from sparse observations under strict latency and memory constraints. Yet most dynamic reconstruction methods rely on hours of per-scene optimization under full-sequence access, limiting practical deployment. In this work, we introduce StreamSplat, a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner. It is achieved via three key technical innovations: 1) a probabilistic sampling mechanism that robustly predicts 3D Gaussians from uncalibrated inputs; 2) a bidirectional deformation field that yields reliable associations across frames and mitigates long-term error accumulation; 3) an adaptive Gaussian fusion operation that propagates persistent Gaussians while handling emerging and vanishing ones. Extensive experiments on standard dynamic and static benchmarks demonstrate that StreamSplat achieves state-of-the-art reconstruction quality and dynamic scene modeling. Uniquely, our method supports the online reconstruction of arbitrarily long video streams with a 1200x speedup over optimization-based methods. Our code and models are available at https://streamsplat3d.github.io/.

StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

TL;DR

This work introduces StreamSplat, a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner and supports the online reconstruction of arbitrarily long video streams with a 1200x speedup over optimization-based methods.

Abstract

Real-time reconstruction of dynamic 3D scenes from uncalibrated video streams demands robust online methods that recover scene dynamics from sparse observations under strict latency and memory constraints. Yet most dynamic reconstruction methods rely on hours of per-scene optimization under full-sequence access, limiting practical deployment. In this work, we introduce StreamSplat, a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner. It is achieved via three key technical innovations: 1) a probabilistic sampling mechanism that robustly predicts 3D Gaussians from uncalibrated inputs; 2) a bidirectional deformation field that yields reliable associations across frames and mitigates long-term error accumulation; 3) an adaptive Gaussian fusion operation that propagates persistent Gaussians while handling emerging and vanishing ones. Extensive experiments on standard dynamic and static benchmarks demonstrate that StreamSplat achieves state-of-the-art reconstruction quality and dynamic scene modeling. Uniquely, our method supports the online reconstruction of arbitrarily long video streams with a 1200x speedup over optimization-based methods. Our code and models are available at https://streamsplat3d.github.io/.

Paper Structure

This paper contains 18 sections, 11 equations, 17 figures, 9 tables, 2 algorithms.

Figures (17)

  • Figure 1: Given an uncalibrated video stream, StreamSplat instantly reconstructs a dynamic 3D Gaussian scene in an online manner, enabling continuous-time 3D reconstruction, depth estimation, and novel view synthesis.
  • Figure 2: Overview of the StreamSplat. Given a pair of frames ($t_1=0,t_n=1$), we first encode them using the Static Encoder to produce canonical 3D Gaussians (Section \ref{['subsec:encoder']}), and then pass the 3DGS Embeddings to the Dynamic Decoder to predict the deformation field (Section \ref{['subsec:decoder']}). The predicted dynamic 3D Gaussians can be rendered at arbitrary time $t \in [0,1]$.
  • Figure 3: Our opacity deformation jointly models persistent, emerging, and vanishing Gaussians.
  • Figure 4: Persistent Gaussians across frames. Red/green-marked Gaussians from initial frame are propagated across frames, showing that adaptive Gaussian fusion preserves long-term temporal consistency under viewpoint and motion changes. Videos are available on the project website.
  • Figure 5: Qualitative comparison on DAVIS.Blue box: given frames; Red box: novel frames. StreamSplat produces high-fidelity and temporal coherent results across both (a) 5-frame and (b) 8-frame interval tasks.
  • ...and 12 more figures