Table of Contents
Fetching ...

Neural Eulerian Scene Flow Fields

Kyle Vedder, Neehar Peri, Ishan Khatri, Siyi Li, Eric Eaton, Mehmet Kocamaz, Yue Wang, Zhiding Yu, Deva Ramanan, Joachim Pehserl

TL;DR

We reformulate scene flow as learning a continuous space-time ODE described by a neural prior (SFvODE), and instantiate this with EulerFlow, a simple unsupervised method that optimizes over full observation sequences using multi-frame reconstruction and cycle-consistency losses. EulerFlow achieves state-of-the-art unsupervised performance on the Argoverse 2 2024 Scene Flow Challenge and Waymo Open Scene Flow benchmarks, including robust motion estimation for small, fast-moving objects and emergent 3D point tracking via Euler integration. The approach generalizes beyond autonomous driving to dynamic tabletop scenes and can leverage monocular depth to handle RGB-only data, suggesting broad applicability of the Eulerian ODE paradigm for dense motion estimation. Future directions include autoregressive, multi-step feedforward extensions and multi-modal fusion to address current limitations such as sparse point clouds and computational cost.

Abstract

We reframe scene flow as the task of estimating a continuous space-time ODE that describes motion for an entire observation sequence, represented with a neural prior. Our method, EulerFlow, optimizes this neural prior estimate against several multi-observation reconstruction objectives, enabling high quality scene flow estimation via pure self-supervision on real-world data. EulerFlow works out-of-the-box without tuning across multiple domains, including large-scale autonomous driving scenes and dynamic tabletop settings. Remarkably, EulerFlow produces high quality flow estimates on small, fast moving objects like birds and tennis balls, and exhibits emergent 3D point tracking behavior by solving its estimated ODE over long-time horizons. On the Argoverse 2 2024 Scene Flow Challenge, EulerFlow outperforms all prior art, surpassing the next-best unsupervised method by more than 2.5x, and even exceeding the next-best supervised method by over 10%.

Neural Eulerian Scene Flow Fields

TL;DR

We reformulate scene flow as learning a continuous space-time ODE described by a neural prior (SFvODE), and instantiate this with EulerFlow, a simple unsupervised method that optimizes over full observation sequences using multi-frame reconstruction and cycle-consistency losses. EulerFlow achieves state-of-the-art unsupervised performance on the Argoverse 2 2024 Scene Flow Challenge and Waymo Open Scene Flow benchmarks, including robust motion estimation for small, fast-moving objects and emergent 3D point tracking via Euler integration. The approach generalizes beyond autonomous driving to dynamic tabletop scenes and can leverage monocular depth to handle RGB-only data, suggesting broad applicability of the Eulerian ODE paradigm for dense motion estimation. Future directions include autoregressive, multi-step feedforward extensions and multi-modal fusion to address current limitations such as sparse point clouds and computational cost.

Abstract

We reframe scene flow as the task of estimating a continuous space-time ODE that describes motion for an entire observation sequence, represented with a neural prior. Our method, EulerFlow, optimizes this neural prior estimate against several multi-observation reconstruction objectives, enabling high quality scene flow estimation via pure self-supervision on real-world data. EulerFlow works out-of-the-box without tuning across multiple domains, including large-scale autonomous driving scenes and dynamic tabletop settings. Remarkably, EulerFlow produces high quality flow estimates on small, fast moving objects like birds and tennis balls, and exhibits emergent 3D point tracking behavior by solving its estimated ODE over long-time horizons. On the Argoverse 2 2024 Scene Flow Challenge, EulerFlow outperforms all prior art, surpassing the next-best unsupervised method by more than 2.5x, and even exceeding the next-best supervised method by over 10%.
Paper Structure (27 sections, 14 equations, 17 figures)

This paper contains 27 sections, 14 equations, 17 figures.

Figures (17)

  • Figure 1: EulerFlow is able to capture the motion of small, fast moving objects with few lidar points, such a bird flying in front of an autonomous vehicle (Figure \ref{['fig:teasersceneflow']}). EulerFlow's flexibility allows it to estimate scene flow for fast-moving table top objects without additional hyperparameter tuning (Figure \ref{['fig:teaserbounce']}). EulerFlow's ODE estimate exhibits emergent 3D point tracking behavior without explicit long-horizon supervision (Figure \ref{['fig:teaserjack']}). Note that point clouds are shown in color for visualization purposes only; RGB is not used during optimization.
  • Figure 2: We visualize an example of five pedestrians crossing the street in front of a stopped car, cherrypicked to have unusually high density lidar returns, making it particularly easy to estimate flow. Figures \ref{['fig:flyingbirdgigachad_twoframe']}--\ref{['fig:flyingbirdgroundtruth_twoframe']} depict a two-frame flow visualization of EulerFlow and several strong baselines. Notably, only visualizing flow over two frames makes it difficult to distinguish flow quality. In contrast, Figures \ref{['fig:flyingbirdgigachad_fullsequence']}--\ref{['fig:flyingbirdgroundtruth_fullsequence']} depict flow vectors over the full sequence, making differences in quality clear; for example, EulerFlow is the only one without artifacts on the stopped car.
  • Figure 3: Overview of our Scene Flow via ODE framework, which estimates an ODE across the entire observation sequence by optimizing against multi-frame objectives. This ODE estimate is represented with a neural prior nsfp, providing a flexible, general representation for describing position-time motion.
  • Figure 4: Comparison of Eulerian and Lagrangian descriptions of 2D flow. An Eulerian view characterizes a flow field via instantaneous velocities at many different points, while a Lagrangian view characterizes a flow field via trajectories of many different particles across time. Both approaches are valid ways of describing an underlying flow field, and with sufficient characterization one view can be readily converted to another, but the Lagrangian view relies on a definition of the definition of consistent canonical frame.
  • Figure 5: Mean Dynamic Normalized EPE of EulerFlow compared to prior art on the Argoverse 2 2024 Scene Flow Challenge test set. EulerFlow is state-of-the-art, beating all supervised (shown in black) and unsupervised (shown in white) methods. Lower is better.
  • ...and 12 more figures