Table of Contents
Fetching ...

ICP-Flow: LiDAR Scene Flow Estimation with ICP

Yancong Lin, Holger Caesar

TL;DR

ICP-Flow presents a learning-free LiDAR scene flow estimator that enforces a multi-object rigid-motion prior by applying ICP to pairs of clustered points after ego-motion compensation and ground removal. A histogram-based initialization seeds ICP, enabling robust per-cluster transformations ${\mathbf{T}}_k \in SE(3)$ from which per-point scene flow is recovered, while a subsequent feedforward network trained on ICP-derived pseudo-labels enables real-time inference. The approach achieves competitive or superior results on Waymo, Argoverse-v2, and nuScenes compared with both unsupervised and supervised baselines, and extends gracefully to a longer temporal horizon up to ${\Delta t}=0.4$ s with a tracker variant. By reducing dependence on large annotated datasets and heavy training, ICP-Flow offers practical, fast, geometry-driven scene flow suitable for autonomous driving perception pipelines. Future work integrates geometric and semantic cues within a unified framework to further enhance robustness and accuracy.

Abstract

Scene flow characterizes the 3D motion between two LiDAR scans captured by an autonomous vehicle at nearby timesteps. Prevalent methods consider scene flow as point-wise unconstrained flow vectors that can be learned by either large-scale training beforehand or time-consuming optimization at inference. However, these methods do not take into account that objects in autonomous driving often move rigidly. We incorporate this rigid-motion assumption into our design, where the goal is to associate objects over scans and then estimate the locally rigid transformations. We propose ICP-Flow, a learning-free flow estimator. The core of our design is the conventional Iterative Closest Point (ICP) algorithm, which aligns the objects over time and outputs the corresponding rigid transformations. Crucially, to aid ICP, we propose a histogram-based initialization that discovers the most likely translation, thus providing a good starting point for ICP. The complete scene flow is then recovered from the rigid transformations. We outperform state-of-the-art baselines, including supervised models, on the Waymo dataset and perform competitively on Argoverse-v2 and nuScenes. Further, we train a feedforward neural network, supervised by the pseudo labels from our model, and achieve top performance among all models capable of real-time inference. We validate the advantage of our model on scene flow estimation with longer temporal gaps, up to 0.4 seconds where other models fail to deliver meaningful results.

ICP-Flow: LiDAR Scene Flow Estimation with ICP

TL;DR

ICP-Flow presents a learning-free LiDAR scene flow estimator that enforces a multi-object rigid-motion prior by applying ICP to pairs of clustered points after ego-motion compensation and ground removal. A histogram-based initialization seeds ICP, enabling robust per-cluster transformations from which per-point scene flow is recovered, while a subsequent feedforward network trained on ICP-derived pseudo-labels enables real-time inference. The approach achieves competitive or superior results on Waymo, Argoverse-v2, and nuScenes compared with both unsupervised and supervised baselines, and extends gracefully to a longer temporal horizon up to s with a tracker variant. By reducing dependence on large annotated datasets and heavy training, ICP-Flow offers practical, fast, geometry-driven scene flow suitable for autonomous driving perception pipelines. Future work integrates geometric and semantic cues within a unified framework to further enhance robustness and accuracy.

Abstract

Scene flow characterizes the 3D motion between two LiDAR scans captured by an autonomous vehicle at nearby timesteps. Prevalent methods consider scene flow as point-wise unconstrained flow vectors that can be learned by either large-scale training beforehand or time-consuming optimization at inference. However, these methods do not take into account that objects in autonomous driving often move rigidly. We incorporate this rigid-motion assumption into our design, where the goal is to associate objects over scans and then estimate the locally rigid transformations. We propose ICP-Flow, a learning-free flow estimator. The core of our design is the conventional Iterative Closest Point (ICP) algorithm, which aligns the objects over time and outputs the corresponding rigid transformations. Crucially, to aid ICP, we propose a histogram-based initialization that discovers the most likely translation, thus providing a good starting point for ICP. The complete scene flow is then recovered from the rigid transformations. We outperform state-of-the-art baselines, including supervised models, on the Waymo dataset and perform competitively on Argoverse-v2 and nuScenes. Further, we train a feedforward neural network, supervised by the pseudo labels from our model, and achieve top performance among all models capable of real-time inference. We validate the advantage of our model on scene flow estimation with longer temporal gaps, up to 0.4 seconds where other models fail to deliver meaningful results.
Paper Structure (31 sections, 2 equations, 8 figures, 7 tables)

This paper contains 31 sections, 2 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: ICP for scene flow. Given two LiDAR scans, we remove the ground, cluster points, and align clusters using ICP, as objects move rigidly. We infer a rigid transformation for each pair of clusters, from which the scene flow can be recovered. Further, we train a feedforward network using the prediction from our model as supervision. The network runs in real-time with only marginal performance loss.
  • Figure 2: Overview of ICP-Flow. Given two full-size LiDAR scans as input, we first do ego-motion compensation and ground removal on each scan. Subsequently, we fuse the non-ground points from both scans and group them into a set of clusters. We pair clusters by spatial locality and feed them to ICP matching for further verification and transformation estimation. We then filter unreliable matches and associate clusters over time. The scene flow is recovered by using the rigid-motion assumption. Crucially, to aid ICP matching, we develop a histogram-based voting strategy for initialization, by exploring the motion rigidity.
  • Figure 3: Scene flow errors with increasing time gap. We show the EPE values for dynamic foreground with respect to the time duration. As the time gap increases, Ours degrades gracefully and the gap to PCA huang2022dynamic, a supervised model designated for this task, is marginal till 0.3 seconds. In contrast, other methods fail to generalize to a longer duration. Ours+Tracker, an extension of Ours that does tracking over time, is able to achieve comparable results without relying on learning from costly annotation.
  • Figure 4: ICP with centroid alignment. We show a pair of associated clusters in (a), colored in green and blue respectively. They are the bird-eye view of a moving truck. ICP fails (d) when simply subtracting the centroids (c).
  • Figure 5: Visualization of predicted scene flow. We qualitatively compare our prediction to the ground truth. For better visualization, we crop the region of interest from the entire scan. We plot the input scans at time $t$ and $t+\Delta t$, namely $\textbf{X}_{t}$ and $\textbf{X}_{t+\Delta t}$, in green and blue, respectively. We color the flow-compensated scan at time $t$, namely $\textbf{X}_{t}^{\prime}$, in purple by adding the predicted scene flow $\textbf{F}_{t}$ to $\textbf{X}_{t}$. In comparison, we use red to indicate the flow-compensated scan at time $t$, namely $\textbf{X}_{t}^{*}$, by adding the ground truth flow. The left figure is composed of $\textbf{X}_{t}$, $\textbf{X}_{t+\Delta t}$ and $\textbf{X}_{t}^{\prime}$. ICP-Flow is able to output reasonable predictions once the blue and purple points align (i.e. overlap) with each other. However, ICP-Flow fails in certain scenarios by associating the wrong clusters, as indicated by the box on the top. We highlight this failure in the right figure, where ✗ denotes a wrong association. As indicated by the dashed lines on the left, ICP-Flow associates clusters $1$ and $2$ (or $\textbf{C}_{1}^{t}$ and $\textbf{C}_{2}^{t+\Delta t}$ ), and estimates a transformation that best aligns them. Unfortunately, $\textbf{C}_{1}^{t}$ remains static within $\Delta t$ according to the ground truth (in red). Similarly, we observe that $\textbf{C}_{3}^{t}$ and $\textbf{C}_{4}^{t+\Delta t}$ are also falsely associated. Interestingly, after careful examination, we find this an annotation error in the preprocessed Waymo dataset huang2022dynamic, as explained in Fig. \ref{['fig:failure_353_raw']}.
  • ...and 3 more figures