Table of Contents
Fetching ...

SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving

Qingwen Zhang, Yi Yang, Peizheng Li, Olov Andersson, Patric Jensfelt

TL;DR

SeFlow is proposed, a self-supervised method that integrates efficient dynamic classification into a learning-based scene flow pipeline that achieves state-of-the-art performance on the self-supervised scene flow task on Argoverse 2 and Waymo datasets.

Abstract

Scene flow estimation predicts the 3D motion at each point in successive LiDAR scans. This detailed, point-level, information can help autonomous vehicles to accurately predict and understand dynamic changes in their surroundings. Current state-of-the-art methods require annotated data to train scene flow networks and the expense of labeling inherently limits their scalability. Self-supervised approaches can overcome the above limitations, yet face two principal challenges that hinder optimal performance: point distribution imbalance and disregard for object-level motion constraints. In this paper, we propose SeFlow, a self-supervised method that integrates efficient dynamic classification into a learning-based scene flow pipeline. We demonstrate that classifying static and dynamic points helps design targeted objective functions for different motion patterns. We also emphasize the importance of internal cluster consistency and correct object point association to refine the scene flow estimation, in particular on object details. Our real-time capable method achieves state-of-the-art performance on the self-supervised scene flow task on Argoverse 2 and Waymo datasets. The code is open-sourced at https://github.com/KTH-RPL/SeFlow along with trained model weights.

SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving

TL;DR

SeFlow is proposed, a self-supervised method that integrates efficient dynamic classification into a learning-based scene flow pipeline that achieves state-of-the-art performance on the self-supervised scene flow task on Argoverse 2 and Waymo datasets.

Abstract

Scene flow estimation predicts the 3D motion at each point in successive LiDAR scans. This detailed, point-level, information can help autonomous vehicles to accurately predict and understand dynamic changes in their surroundings. Current state-of-the-art methods require annotated data to train scene flow networks and the expense of labeling inherently limits their scalability. Self-supervised approaches can overcome the above limitations, yet face two principal challenges that hinder optimal performance: point distribution imbalance and disregard for object-level motion constraints. In this paper, we propose SeFlow, a self-supervised method that integrates efficient dynamic classification into a learning-based scene flow pipeline. We demonstrate that classifying static and dynamic points helps design targeted objective functions for different motion patterns. We also emphasize the importance of internal cluster consistency and correct object point association to refine the scene flow estimation, in particular on object details. Our real-time capable method achieves state-of-the-art performance on the self-supervised scene flow task on Argoverse 2 and Waymo datasets. The code is open-sourced at https://github.com/KTH-RPL/SeFlow along with trained model weights.
Paper Structure (40 sections, 13 equations, 8 figures, 8 tables)

This paper contains 40 sections, 13 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: LiDAR scene flow estimation using our SeFlow method on Argoverse 2. The predicted scene flow for each point is color-coded based on direction. The white indicates static points whose flow is zero. More saturated colors indicate higher velocities. (a) Camera view for visualization purposes only. (b),(c) are zoomed-in views showing the baseline from ZeroFlow zeroflow as well as SeFlow (ours). When predicting the flow of a large and long vehicle, the baseline predicts a portion of the flow as 0, whereas our estimates are consistent. In addition, the baseline tends to ignore small-scale objects, e.g., pedestrians, while our method can better handle such small and slow-moving objects.
  • Figure 2: SeFlow Architecture. Top: With two consecutive point clouds as inputs, our model predicts the estimated flows of all points. Bottom: Conceptual visualization of the Chamfer loss and the three proposed training losses. With the original input $\mathcal{P}_t$ (2 static points for the building, and 2 dynamic points for the car) plus the estimated flow $\hat{\mathcal{F}}_t$, we can calculate the error between estimated $\hat{\mathcal{P}}_{t+1}$ and the next frame point cloud $\mathcal{P}_{t+1}$ ($\mathcal{L}_{\text{cham}}$). The second part is $\mathcal{L}_{\text{dcham}}$ that only calculates the distance error between dynamic points. The third loss says that the estimated flows of static points should be zero. Finally, we assume that the flow at points from the same cluster should be consistent, and mitigate underestimation by using the proposed upper bound on the flow.
  • Figure 3: Simple visualization of the shortcomings of using Chamfer distance as a supervisory signal for flow value estimation. The denser color on points in (b) represents higher flow values and white means the point's flow is zero. (a) illustrates how to calculate loss based on Chamfer distance, and (b) shows that the flow results, based on the nearest neighbor principle, can lead to zero flow estimation for the middle of the object.
  • Figure 4: The relationship between flow estimation error and training dataset size, scaled in $\log_{10}$. Methods with $^\star$ are supervised by ground truth labels. SeFlow uses less data but gets comparable results compared to FastFlow3D and ZeroFlow.
  • Figure 5: Qualitative results from Argoverse 2 validation set. The top row displays the ground truth flow, the middle row presents the SeFlow result, and the bottom row showcases another self-supervised method ZeroFlowzeroflow result. Different color indicates different directions and more saturated color means larger flow estimation. Ego motion is compensated for a clearer view.
  • ...and 3 more figures