Table of Contents
Fetching ...

ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting

Daniel Wang, Patrick Rim, Tian Tian, Dong Lao, Alex Wong, Ganesh Sundaramoorthi

TL;DR

The paper addresses dynamic scene extrapolation by predicting future 3D scene states from past observations. It introduces ODE-GS, which decouples reconstruction from forecasting by learning a Gaussian trajectory interpolation model and a Transformer-based latent ODE that evolves a latent state $z(t)$ with $\\dot{z}=f_\theta(z)$ and decodes to Gaussian parameters. The method employs dynamic trajectory sampling and adaptive regularization to enforce smooth continuous trajectories, achieving state-of-the-art results on D-NeRF, NVFi, and HyperNeRF. This enables rendering at arbitrary future timestamps with robust performance, offering practical benefits for robotics, augmented reality, and autonomous systems.

Abstract

We introduce ODE-GS, a novel approach that integrates 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to enable future extrapolation of dynamic 3D scenes. Unlike existing dynamic scene reconstruction methods, which rely on time-conditioned deformation networks and are limited to interpolation within a fixed time window, ODE-GS eliminates timestamp dependency by modeling Gaussian parameter trajectories as continuous-time latent dynamics. Our approach first learns an interpolation model to generate accurate Gaussian trajectories within the observed window, then trains a Transformer encoder to aggregate past trajectories into a latent state evolved via a neural ODE. Finally, numerical integration produces smooth, physically plausible future Gaussian trajectories, enabling rendering at arbitrary future timestamps. On the D-NeRF, NVFi, and HyperNeRF benchmarks, ODE-GS achieves state-of-the-art extrapolation performance, improving metrics by 19.8% compared to leading baselines, demonstrating its ability to accurately represent and predict 3D scene dynamics.

ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting

TL;DR

The paper addresses dynamic scene extrapolation by predicting future 3D scene states from past observations. It introduces ODE-GS, which decouples reconstruction from forecasting by learning a Gaussian trajectory interpolation model and a Transformer-based latent ODE that evolves a latent state with and decodes to Gaussian parameters. The method employs dynamic trajectory sampling and adaptive regularization to enforce smooth continuous trajectories, achieving state-of-the-art results on D-NeRF, NVFi, and HyperNeRF. This enables rendering at arbitrary future timestamps with robust performance, offering practical benefits for robotics, augmented reality, and autonomous systems.

Abstract

We introduce ODE-GS, a novel approach that integrates 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to enable future extrapolation of dynamic 3D scenes. Unlike existing dynamic scene reconstruction methods, which rely on time-conditioned deformation networks and are limited to interpolation within a fixed time window, ODE-GS eliminates timestamp dependency by modeling Gaussian parameter trajectories as continuous-time latent dynamics. Our approach first learns an interpolation model to generate accurate Gaussian trajectories within the observed window, then trains a Transformer encoder to aggregate past trajectories into a latent state evolved via a neural ODE. Finally, numerical integration produces smooth, physically plausible future Gaussian trajectories, enabling rendering at arbitrary future timestamps. On the D-NeRF, NVFi, and HyperNeRF benchmarks, ODE-GS achieves state-of-the-art extrapolation performance, improving metrics by 19.8% compared to leading baselines, demonstrating its ability to accurately represent and predict 3D scene dynamics.

Paper Structure

This paper contains 19 sections, 25 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Unlike existing methods that focus on interpolation, i.e., reconstructing novel scene views at unseen timestamps within the observed time window, we focus on extrapolation, i.e., extending scene dynamics beyond the observed times, by first training a representation of the observed scene and then using a sequence-to-sequence model to reconstruct future novel views via latent ODE dynamics.
  • Figure 2: 1: We initialize temporal trajectories of 3D Gaussian parameters using the frozen interpolation model, which consists of the canonical 3D Gaussian set and a time-conditioned deformation MLP. These trajectories lie entirely within the observed temporal window. 2: Through our dynamic sampling strategy, each Gaussian trajectory is sampled into multiple observed prefix (input) and a held-out suffix (target) trajectories, providing training pairs for the Transformer latent ODE. 3: Latent-ODE training encodes the observed prefix with a Transformer, infers a latent initial state, and evolves it forward with a neural ODE. 4: A decoder maps the latent path back to Gaussian parameters, which are supervised against the ground-truth suffixes via an L1 loss and smoothness regularizers.
  • Figure 3: Qualitative visualization on 5 scenes from DNeRF dataset, from left to right are the ground truth image, rendered result from Deformable GSyang2023deformable, residual of Deformable GS against GT, GaussianPredictionzhao2024gaussianprediction, residual of GaussianPrediction against GT, and finally Our as well as Ours residual against GT.
  • Figure 4: Ablation study average results over the NVFi dataset.
  • Figure 5: Qualitative results on 5 scenes from the NVFI li2023nvfi dataset, from left to right are the ground truth image, rendered result from Deformable GSyang2023deformable, residual of Deformable GS against GT, GaussianPredictionzhao2024gaussianprediction, residual of GaussianPrediction against GT, and finally Our as well as Ours residual against GT.