Table of Contents
Fetching ...

Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction

Jiacong Chen, Qingyu Mao, Youneng Bao, Xiandong Meng, Fanyang Meng, Ronggang Wang, Yongsheng Liang

TL;DR

This work tackles storage bottlenecks in online free-viewpoint video by introducing Compact Gaussian Streaming (ComGS), a motion-aware, keypoint-driven framework. It identifies motion regions with a viewspace gradient difference, propagates motion through adaptive spatial influence fields, and applies an error-aware correction to key frames, drastically reducing data transmission while preserving rendering quality. The approach achieves up to ~159× compression over prior online methods and ~14× over-state-of-the-art baselines, with competitive PSNR and real-time rendering, through selective updates and compact per-keypoint parameterization. This has practical implications for real-time volumetric video streaming and interactive 3D viewing in bandwidth-constrained environments.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a high-fidelity and efficient paradigm for online free-viewpoint video (FVV) reconstruction, offering viewers rapid responsiveness and immersive experiences. However, existing online methods face challenge in prohibitive storage requirements primarily due to point-wise modeling that fails to exploit the motion properties. To address this limitation, we propose a novel Compact Gaussian Streaming (ComGS) framework, leveraging the locality and consistency of motion in dynamic scene, that models object-consistent Gaussian point motion through keypoint-driven motion representation. By transmitting only the keypoint attributes, this framework provides a more storage-efficient solution. Specifically, we first identify a sparse set of motion-sensitive keypoints localized within motion regions using a viewspace gradient difference strategy. Equipped with these keypoints, we propose an adaptive motion-driven mechanism that predicts a spatial influence field for propagating keypoint motion to neighboring Gaussian points with similar motion. Moreover, ComGS adopts an error-aware correction strategy for key frame reconstruction that selectively refines erroneous regions and mitigates error accumulation without unnecessary overhead. Overall, ComGS achieves a remarkable storage reduction of over 159 X compared to 3DGStream and 14 X compared to the SOTA method QUEEN, while maintaining competitive visual fidelity and rendering speed.

Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction

TL;DR

This work tackles storage bottlenecks in online free-viewpoint video by introducing Compact Gaussian Streaming (ComGS), a motion-aware, keypoint-driven framework. It identifies motion regions with a viewspace gradient difference, propagates motion through adaptive spatial influence fields, and applies an error-aware correction to key frames, drastically reducing data transmission while preserving rendering quality. The approach achieves up to ~159× compression over prior online methods and ~14× over-state-of-the-art baselines, with competitive PSNR and real-time rendering, through selective updates and compact per-keypoint parameterization. This has practical implications for real-time volumetric video streaming and interactive 3D viewing in bandwidth-constrained environments.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a high-fidelity and efficient paradigm for online free-viewpoint video (FVV) reconstruction, offering viewers rapid responsiveness and immersive experiences. However, existing online methods face challenge in prohibitive storage requirements primarily due to point-wise modeling that fails to exploit the motion properties. To address this limitation, we propose a novel Compact Gaussian Streaming (ComGS) framework, leveraging the locality and consistency of motion in dynamic scene, that models object-consistent Gaussian point motion through keypoint-driven motion representation. By transmitting only the keypoint attributes, this framework provides a more storage-efficient solution. Specifically, we first identify a sparse set of motion-sensitive keypoints localized within motion regions using a viewspace gradient difference strategy. Equipped with these keypoints, we propose an adaptive motion-driven mechanism that predicts a spatial influence field for propagating keypoint motion to neighboring Gaussian points with similar motion. Moreover, ComGS adopts an error-aware correction strategy for key frame reconstruction that selectively refines erroneous regions and mitigates error accumulation without unnecessary overhead. Overall, ComGS achieves a remarkable storage reduction of over 159 X compared to 3DGStream and 14 X compared to the SOTA method QUEEN, while maintaining competitive visual fidelity and rendering speed.

Paper Structure

This paper contains 23 sections, 15 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Left: Experimental results on N3DV dataset li2022neural showcase the effectiveness of our method, which reduces the storage requirement of 3DGStream sun20243dgstream by 159 $\times$, with enhanced visual quality. Right: Comparison with existing methods in storage and reconstruction fidelity. Hollow circles denote offline methods, while solid circles represent online methods.
  • Figure 2: The overall pipeline of ComGS framework. (a) The reconstruction process starts from the first frame initialized using vanilla 3DGS kerbl20233d. Subsequent frames are organized into groups of frames (GoFs). For non-key frames, (b) we begins with a motion-sensitive keypoint selection using a viewspace gradient difference strategy, (c) and utilizes an adaptive motion-driven mechanism to control neighboring points motion. For key frames, (d) an error-aware correction strategy is introduced to mitigate the error accumulation across frames.
  • Figure 3: Quantitative comparison. We visualize our method and other online FVV methods on N3DV li2022neural and MeetRoom li2022streaming dataset.
  • Figure 4: Visualization of our keypoint-driven motion representation. Top: selected keypoints are concentrated in motion regions. Bottom: adaptive control of neighboring points also focuses on motion-intensive areas, enabling accurate and efficient motion modeling.
  • Figure 5: Visualization of different selection methods and corresponding updated regions.
  • ...and 4 more figures