Table of Contents
Fetching ...

Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping

Junmyeong Lee, Hoseung Choi, Minsu Cho

TL;DR

Motion Group-aware Gaussian Forecasting (MoGaF), a framework for long-term scene extrapolation built upon the 4D Gaussian Splatting representation, which introduces motion-aware Gaussian grouping and group-wise optimization to enforce physically consistent motion across both rigid and non-rigid regions.

Abstract

Forecasting dynamic scenes remains a fundamental challenge in computer vision, as limited observations make it difficult to capture coherent object-level motion and long-term temporal evolution. We present Motion Group-aware Gaussian Forecasting (MoGaF), a framework for long-term scene extrapolation built upon the 4D Gaussian Splatting representation. MoGaF introduces motion-aware Gaussian grouping and group-wise optimization to enforce physically consistent motion across both rigid and non-rigid regions, yielding spatially coherent dynamic representations. Leveraging this structured space-time representation, a lightweight forecasting module predicts future motion, enabling realistic and temporally stable scene evolution. Experiments on synthetic and real-world datasets demonstrate that MoGaF consistently outperforms existing baselines in rendering quality, motion plausibility, and long-term forecasting stability. Our project page is available at https://slime0519.github.io/mogaf

Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping

TL;DR

Motion Group-aware Gaussian Forecasting (MoGaF), a framework for long-term scene extrapolation built upon the 4D Gaussian Splatting representation, which introduces motion-aware Gaussian grouping and group-wise optimization to enforce physically consistent motion across both rigid and non-rigid regions.

Abstract

Forecasting dynamic scenes remains a fundamental challenge in computer vision, as limited observations make it difficult to capture coherent object-level motion and long-term temporal evolution. We present Motion Group-aware Gaussian Forecasting (MoGaF), a framework for long-term scene extrapolation built upon the 4D Gaussian Splatting representation. MoGaF introduces motion-aware Gaussian grouping and group-wise optimization to enforce physically consistent motion across both rigid and non-rigid regions, yielding spatially coherent dynamic representations. Leveraging this structured space-time representation, a lightweight forecasting module predicts future motion, enabling realistic and temporally stable scene evolution. Experiments on synthetic and real-world datasets demonstrate that MoGaF consistently outperforms existing baselines in rendering quality, motion plausibility, and long-term forecasting stability. Our project page is available at https://slime0519.github.io/mogaf
Paper Structure (62 sections, 32 equations, 13 figures, 6 tables, 3 algorithms)

This paper contains 62 sections, 32 equations, 13 figures, 6 tables, 3 algorithms.

Figures (13)

  • Figure 1: MoGaF forecasts future frames of dynamic-scene input videos by reconstructing object-level components with distinct motion patterns. Our approach delivers long-term, high-fidelity predictions even on real-world videos with complex dynamics.
  • Figure 2: Overall pipeline of MoGaF. Given a video, MoGaF generates future frames of the scene. To achieve realistic forecasting, our method builds on 4DGS representation and proceeds as follows: (1) Gaussian Grouping: Gaussians are clustered into motion-consistent object groups, with each group labeled as rigid or non-rigid using grounded 2D segmentation. (2) Group-wise Optimization: Grouped Gaussians are refined with rigidity-aware motion constraints: rigid groups are guided by a shared $SE(3)$ transform, while non-rigid groups are regularized with local motion smoothness. (3) Group-wise Forecasting: For each group, a lightweight Transformer-based forecaster extrapolates Gaussian trajectories beyond the observed frames, enabling rendering at novel viewpoints for future timesteps.
  • Figure 3: Result of Gaussian grouping. Compared to a (a) simple extension of 3DGS grouping lyu2024gaga and (b) single-frame mask–based region growing, our hybrid approach produces complete and reliable motion-aware Gaussian groups.
  • Figure 4: Qualitative results on iPhone dataset. We present forecasted frames from test camera views. (a) and (b) correspond to settings where the first 80% and 60% of frames are used for training, and the remaining 20% and 40% are forecasted, respectively.
  • Figure 5: Forecasting results on D-NeRF dataset. We render extrapolated future frames. Note that in Obs. Timesteps, the first and second columns show renderings reconstructed from training views at the first and last observed timesteps, respectively.
  • ...and 8 more figures