Neural Eulerian Scene Flow Fields
Kyle Vedder, Neehar Peri, Ishan Khatri, Siyi Li, Eric Eaton, Mehmet Kocamaz, Yue Wang, Zhiding Yu, Deva Ramanan, Joachim Pehserl
TL;DR
We reformulate scene flow as learning a continuous space-time ODE described by a neural prior (SFvODE), and instantiate this with EulerFlow, a simple unsupervised method that optimizes over full observation sequences using multi-frame reconstruction and cycle-consistency losses. EulerFlow achieves state-of-the-art unsupervised performance on the Argoverse 2 2024 Scene Flow Challenge and Waymo Open Scene Flow benchmarks, including robust motion estimation for small, fast-moving objects and emergent 3D point tracking via Euler integration. The approach generalizes beyond autonomous driving to dynamic tabletop scenes and can leverage monocular depth to handle RGB-only data, suggesting broad applicability of the Eulerian ODE paradigm for dense motion estimation. Future directions include autoregressive, multi-step feedforward extensions and multi-modal fusion to address current limitations such as sparse point clouds and computational cost.
Abstract
We reframe scene flow as the task of estimating a continuous space-time ODE that describes motion for an entire observation sequence, represented with a neural prior. Our method, EulerFlow, optimizes this neural prior estimate against several multi-observation reconstruction objectives, enabling high quality scene flow estimation via pure self-supervision on real-world data. EulerFlow works out-of-the-box without tuning across multiple domains, including large-scale autonomous driving scenes and dynamic tabletop settings. Remarkably, EulerFlow produces high quality flow estimates on small, fast moving objects like birds and tennis balls, and exhibits emergent 3D point tracking behavior by solving its estimated ODE over long-time horizons. On the Argoverse 2 2024 Scene Flow Challenge, EulerFlow outperforms all prior art, surpassing the next-best unsupervised method by more than 2.5x, and even exceeding the next-best supervised method by over 10%.
