Let-It-Flow: Simultaneous Optimization of 3D Flow and Object Clustering
Patrik Vacek, David Hurych, Tomáš Svoboda, Karel Zimmermann
TL;DR
Let-It-Flow presents a self-supervised approach for 3D scene flow that jointly optimizes per-point motion and object clustering in LiDAR sequences. It introduces two cluster types—overlapping soft clusters and hard growing clusters—and two rigidity losses plus a distance loss to enforce physically plausible motion across multiple independently moving objects. By dropping neural-prior regularization and using an optimization-based framework, it achieves state-of-the-art results on Argoverse2, Waymo, and StereoKITTI, with notable improvements for pedestrians and cyclists. The method offers a training-data-free alternative that scales to complex dynamic scenes while remaining GPU-friendly through overlapping neighborhood computations.
Abstract
We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences, which is crucial to various tasks like trajectory prediction or instance segmentation. In the absence of ground truth scene flow labels, contemporary approaches concentrate on deducing optimizing flow across sequential pairs of point clouds by incorporating structure based regularization on flow and object rigidity. The rigid objects are estimated by a variety of 3D spatial clustering methods. While state-of-the-art methods successfully capture overall scene motion using the Neural Prior structure, they encounter challenges in discerning multi-object motions. We identified the structural constraints and the use of large and strict rigid clusters as the main pitfall of the current approaches and we propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters representation. Flow is then jointly estimated with progressively growing non-overlapping rigid clusters together with fixed size overlapping soft clusters. We evaluate our method on multiple datasets with LiDAR point clouds, demonstrating the superior performance over the self-supervised baselines reaching new state of the art results. Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other which includes pedestrians, cyclists and other vulnerable road users. Our codes are publicly available on https://github.com/ctu-vras/let-it-flow.
