Table of Contents
Fetching ...

RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians

Qiankun Gao, Yanmin Wu, Chengxiang Wen, Jiarui Meng, Luyang Tang, Jie Chen, Ronggang Wang, Jian Zhang

TL;DR

Dynamic scenes with large-scale, complex motions are difficult to reconstruct with existing NeRF-based methods. RelayGS introduces a three-stage pipeline that decouples foreground and background, densifies motion via Relay Gaussians across temporal segments, and jointly optimizes a 4D spatiotemporal model using HexPlane-based MLPs with a gamma scaling for large motions. The method achieves state-of-the-art PSNR on PanopticSports and VRU Basketball datasets, offering improved foreground completeness and motion coherence while maintaining practical rendering speeds. This approach provides a scalable, explicit 4D representation suitable for real-world dynamic scenes and sports videography.

Abstract

Reconstructing dynamic scenes with large-scale and complex motions remains a significant challenge. Recent techniques like Neural Radiance Fields and 3D Gaussian Splatting (3DGS) have shown promise but still struggle with scenes involving substantial movement. This paper proposes RelayGS, a novel method based on 3DGS, specifically designed to represent and reconstruct highly dynamic scenes. Our RelayGS learns a complete 4D representation with canonical 3D Gaussians and a compact motion field, consisting of three stages. First, we learn a fundamental 3DGS from all frames, ignoring temporal scene variations, and use a learnable mask to separate the highly dynamic foreground from the minimally moving background. Second, we replicate multiple copies of the decoupled foreground Gaussians from the first stage, each corresponding to a temporal segment, and optimize them using pseudo-views constructed from multiple frames within each segment. These Gaussians, termed Relay Gaussians, act as explicit relay nodes, simplifying and breaking down large-scale motion trajectories into smaller, manageable segments. Finally, we jointly learn the scene's temporal motion and refine the canonical Gaussians learned from the first two stages. We conduct thorough experiments on two dynamic scene datasets featuring large and complex motions, where our RelayGS outperforms state-of-the-arts by more than 1 dB in PSNR, and successfully reconstructs real-world basketball game scenes in a much more complete and coherent manner, whereas previous methods usually struggle to capture the complex motion of players. Code will be publicly available at https://github.com/gqk/RelayGS

RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians

TL;DR

Dynamic scenes with large-scale, complex motions are difficult to reconstruct with existing NeRF-based methods. RelayGS introduces a three-stage pipeline that decouples foreground and background, densifies motion via Relay Gaussians across temporal segments, and jointly optimizes a 4D spatiotemporal model using HexPlane-based MLPs with a gamma scaling for large motions. The method achieves state-of-the-art PSNR on PanopticSports and VRU Basketball datasets, offering improved foreground completeness and motion coherence while maintaining practical rendering speeds. This approach provides a scalable, explicit 4D representation suitable for real-world dynamic scenes and sports videography.

Abstract

Reconstructing dynamic scenes with large-scale and complex motions remains a significant challenge. Recent techniques like Neural Radiance Fields and 3D Gaussian Splatting (3DGS) have shown promise but still struggle with scenes involving substantial movement. This paper proposes RelayGS, a novel method based on 3DGS, specifically designed to represent and reconstruct highly dynamic scenes. Our RelayGS learns a complete 4D representation with canonical 3D Gaussians and a compact motion field, consisting of three stages. First, we learn a fundamental 3DGS from all frames, ignoring temporal scene variations, and use a learnable mask to separate the highly dynamic foreground from the minimally moving background. Second, we replicate multiple copies of the decoupled foreground Gaussians from the first stage, each corresponding to a temporal segment, and optimize them using pseudo-views constructed from multiple frames within each segment. These Gaussians, termed Relay Gaussians, act as explicit relay nodes, simplifying and breaking down large-scale motion trajectories into smaller, manageable segments. Finally, we jointly learn the scene's temporal motion and refine the canonical Gaussians learned from the first two stages. We conduct thorough experiments on two dynamic scene datasets featuring large and complex motions, where our RelayGS outperforms state-of-the-arts by more than 1 dB in PSNR, and successfully reconstructs real-world basketball game scenes in a much more complete and coherent manner, whereas previous methods usually struggle to capture the complex motion of players. Code will be publicly available at https://github.com/gqk/RelayGS

Paper Structure

This paper contains 22 sections, 10 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Framework of the proposed RelayGS. (a) Initialize the scene with all images and separate the relatively static background and dynamic foreground using a learnable mask (visualized as yellow and red). (b) Construct pseudo-GT views through multi-view blending to optimize Relay Gaaussians for decomposing complex trajectories. (c) Based on the HexPlane 4D representation, using different MLPs for foreground and background Gaussians to obtain temporal deformation, and then render through the differentiable pipeline of 3DGS.
  • Figure 2: Qualitative comparisons on GZ scene of VRU Basketball Games dataset.
  • Figure 3: Qualitative comparisons on Football scene of PanopticSports dataset.
  • Figure 4: The visualization of canonical 3D Gaussians. (a) Reference image of the scene. (b) Initialization by 4D-GS, with the foreground Gaussian almost eliminated. (c) Initialization by our method achieves separation of background and foreground, visualized in different colors. (d) Relay Gaussians (red) generated in the second stage realize the decomposition of large-scale complex trajectories.
  • Figure 5: Illustrative depiction of two types of densification. In 3DGS for static scene reconstruction, spatial densification is employed to better fit 3D structures. Prior 4D methods, as shown in (a), perform densification within a canonical 3D space, relying on deformation fields to model motion trajectories, but often fail to sufficiently represent these trajectories. As shown in (b), explicitly densifying along the motion trajectory by adding new Gaussians enables a more accurate representation of dynamic motion. Our method introduces Relay Gaussians, fundamentally rooted in the intrinsic combination of spatial and temporal densification, enabling enhanced 4D reconstruction.
  • ...and 7 more figures