Table of Contents
Fetching ...

EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation

Chao Zhang, Yifeng Zhou, Shuheng Wang, Wenfa Li, Degang Wang, Yi Xu, Shaohui Jiao

TL;DR

The paper tackles the challenge of long-sequence dynamic scene reconstruction with complex motions by avoiding GoP-based segmentation and adopting an explicit evolving 3D Gaussian representation. It introduces EvolvingGS, a two-stage pipeline consisting of a Warping Stage for coarse, flow-guided alignment using sparse control points and a Detail Refinement Stage that spawns and prunes Gaussians to handle topology changes while keeping appearance features temporally coherent; a differential temporal encoding scheme further compresses the evolving model. The approach leverages a two-stream refinement (reference and extension Gaussians) with a contribution-based pruning metric to control model growth, and integrates an adaptive iteration strategy to balance quality and efficiency. Experiments on public and challenging custom datasets show state-of-the-art reconstruction quality for extended GoP lengths and achieve over 50x compression, demonstrating substantial practical impact for streaming dynamic scenes with high fidelity.

Abstract

We have recently seen great progress in 3D scene reconstruction through explicit point-based 3D Gaussian Splatting (3DGS), notable for its high quality and fast rendering speed. However, reconstructing dynamic scenes such as complex human performances with long durations remains challenging. Prior efforts fall short of modeling a long-term sequence with drastic motions, frequent topology changes or interactions with props, and resort to segmenting the whole sequence into groups of frames that are processed independently, which undermines temporal stability and thereby leads to an unpleasant viewing experience and inefficient storage footprint. In view of this, we introduce EvolvingGS, a two-stage strategy that first deforms the Gaussian model to coarsely align with the target frame, and then refines it with minimal point addition/subtraction, particularly in fast-changing areas. Owing to the flexibility of the incrementally evolving representation, our method outperforms existing approaches in terms of both per-frame and temporal quality metrics while maintaining fast rendering through its purely explicit representation. Moreover, by exploiting temporal coherence between successive frames, we propose a simple yet effective compression algorithm that achieves over 50x compression rate. Extensive experiments on both public benchmarks and challenging custom datasets demonstrate that our method significantly advances the state-of-the-art in dynamic scene reconstruction, particularly for extended sequences with complex human performances.

EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation

TL;DR

The paper tackles the challenge of long-sequence dynamic scene reconstruction with complex motions by avoiding GoP-based segmentation and adopting an explicit evolving 3D Gaussian representation. It introduces EvolvingGS, a two-stage pipeline consisting of a Warping Stage for coarse, flow-guided alignment using sparse control points and a Detail Refinement Stage that spawns and prunes Gaussians to handle topology changes while keeping appearance features temporally coherent; a differential temporal encoding scheme further compresses the evolving model. The approach leverages a two-stream refinement (reference and extension Gaussians) with a contribution-based pruning metric to control model growth, and integrates an adaptive iteration strategy to balance quality and efficiency. Experiments on public and challenging custom datasets show state-of-the-art reconstruction quality for extended GoP lengths and achieve over 50x compression, demonstrating substantial practical impact for streaming dynamic scenes with high fidelity.

Abstract

We have recently seen great progress in 3D scene reconstruction through explicit point-based 3D Gaussian Splatting (3DGS), notable for its high quality and fast rendering speed. However, reconstructing dynamic scenes such as complex human performances with long durations remains challenging. Prior efforts fall short of modeling a long-term sequence with drastic motions, frequent topology changes or interactions with props, and resort to segmenting the whole sequence into groups of frames that are processed independently, which undermines temporal stability and thereby leads to an unpleasant viewing experience and inefficient storage footprint. In view of this, we introduce EvolvingGS, a two-stage strategy that first deforms the Gaussian model to coarsely align with the target frame, and then refines it with minimal point addition/subtraction, particularly in fast-changing areas. Owing to the flexibility of the incrementally evolving representation, our method outperforms existing approaches in terms of both per-frame and temporal quality metrics while maintaining fast rendering through its purely explicit representation. Moreover, by exploiting temporal coherence between successive frames, we propose a simple yet effective compression algorithm that achieves over 50x compression rate. Extensive experiments on both public benchmarks and challenging custom datasets demonstrate that our method significantly advances the state-of-the-art in dynamic scene reconstruction, particularly for extended sequences with complex human performances.

Paper Structure

This paper contains 12 sections, 9 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Our EvolvingGS framework enables continuous reconstruction of dynamic sequences (top) across diverse scenarios (bottom). Our method ensures consistent high-fidelity rendering and temporal coherence throughout extended dynamic sequences of any length, effectively handling complex motions and realistic clothing deformations without dependence on global keyframe switching. The method achieves efficient compression across varied capture scenarios, with over 50x compression rate while preserving visual quality.
  • Figure 2: Overview of EvolvingGS Framework. (a) Our proposed method starts from establishing a baseline Gaussian model for the first frame using original 3DGS3DGS algorithm. (b) Warping stage non-rigidly deforms the Gaussian model to coarsely fit to the appearance captured by the next frame. (c) Refinement strategy with addition/subtraction of Gaussian points enabled is applied to the deformed result to handle emerging or vanishing objects and further improve the detail quality. Appearance-related features of Gaussians are fixed to promote temporal coherence. The framework iteratively applies stages (b) and (c) throughout the sequence, adaptively evolving the representation through reference and extension points (shown in green and red respectively in the evolving view panel) to handle topology changes and new geometric details.
  • Figure 3: Warping stage fails to handle cases like topology change (upper left) or fine-scale deformations like facial expressions (upper right). The bottom row shows the corresponding results after refinement.
  • Figure 4: Variance comparison on $G^{ref}$/$G^{ext}$ feature residuals and their raw feature values.
  • Figure 5: (a) Ablation study on pruning by contribution strategy. (b) Relation between the pruning threshold and Gaussian point number/PSNR on average. $\epsilon_{beta}$ is always set 3x larger than $\epsilon_{alpha}$.
  • ...and 3 more figures