ICP-Flow: LiDAR Scene Flow Estimation with ICP
Yancong Lin, Holger Caesar
TL;DR
ICP-Flow presents a learning-free LiDAR scene flow estimator that enforces a multi-object rigid-motion prior by applying ICP to pairs of clustered points after ego-motion compensation and ground removal. A histogram-based initialization seeds ICP, enabling robust per-cluster transformations ${\mathbf{T}}_k \in SE(3)$ from which per-point scene flow is recovered, while a subsequent feedforward network trained on ICP-derived pseudo-labels enables real-time inference. The approach achieves competitive or superior results on Waymo, Argoverse-v2, and nuScenes compared with both unsupervised and supervised baselines, and extends gracefully to a longer temporal horizon up to ${\Delta t}=0.4$ s with a tracker variant. By reducing dependence on large annotated datasets and heavy training, ICP-Flow offers practical, fast, geometry-driven scene flow suitable for autonomous driving perception pipelines. Future work integrates geometric and semantic cues within a unified framework to further enhance robustness and accuracy.
Abstract
Scene flow characterizes the 3D motion between two LiDAR scans captured by an autonomous vehicle at nearby timesteps. Prevalent methods consider scene flow as point-wise unconstrained flow vectors that can be learned by either large-scale training beforehand or time-consuming optimization at inference. However, these methods do not take into account that objects in autonomous driving often move rigidly. We incorporate this rigid-motion assumption into our design, where the goal is to associate objects over scans and then estimate the locally rigid transformations. We propose ICP-Flow, a learning-free flow estimator. The core of our design is the conventional Iterative Closest Point (ICP) algorithm, which aligns the objects over time and outputs the corresponding rigid transformations. Crucially, to aid ICP, we propose a histogram-based initialization that discovers the most likely translation, thus providing a good starting point for ICP. The complete scene flow is then recovered from the rigid transformations. We outperform state-of-the-art baselines, including supervised models, on the Waymo dataset and perform competitively on Argoverse-v2 and nuScenes. Further, we train a feedforward neural network, supervised by the pseudo labels from our model, and achieve top performance among all models capable of real-time inference. We validate the advantage of our model on scene flow estimation with longer temporal gaps, up to 0.4 seconds where other models fail to deliver meaningful results.
