Table of Contents
Fetching ...

Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations

Kewei Wang, Yizheng Wu, Jun Cen, Zhiyu Pan, Xingyi Li, Zhe Wang, Zhiguo Cao, Guosheng Lin

TL;DR

The feasibility of self-supervised motion prediction with only unlabeled Li-DAR point clouds is explored and the significant superiority of this approach over the state-of-the-art self-supervised methods is demonstrated.

Abstract

The perception of motion behavior in a dynamic environment holds significant importance for autonomous driving systems, wherein class-agnostic motion prediction methods directly predict the motion of the entire point cloud. While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming. Therefore, several annotation-efficient methods have been proposed to address this challenge. Although effective, these methods rely on weak annotations or additional multi-modal data like images, and the potential benefits inherent in the point cloud sequence are still underexplored. To this end, we explore the feasibility of self-supervised motion prediction with only unlabeled LiDAR point clouds. Initially, we employ an optimal transport solver to establish coarse correspondences between current and future point clouds as the coarse pseudo motion labels. Training models directly using such coarse labels leads to noticeable spatial and temporal prediction inconsistencies. To mitigate these issues, we introduce three simple spatial and temporal regularization losses, which facilitate the self-supervised training process effectively. Experimental results demonstrate the significant superiority of our approach over the state-of-the-art self-supervised methods.

Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations

TL;DR

The feasibility of self-supervised motion prediction with only unlabeled Li-DAR point clouds is explored and the significant superiority of this approach over the state-of-the-art self-supervised methods is demonstrated.

Abstract

The perception of motion behavior in a dynamic environment holds significant importance for autonomous driving systems, wherein class-agnostic motion prediction methods directly predict the motion of the entire point cloud. While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming. Therefore, several annotation-efficient methods have been proposed to address this challenge. Although effective, these methods rely on weak annotations or additional multi-modal data like images, and the potential benefits inherent in the point cloud sequence are still underexplored. To this end, we explore the feasibility of self-supervised motion prediction with only unlabeled LiDAR point clouds. Initially, we employ an optimal transport solver to establish coarse correspondences between current and future point clouds as the coarse pseudo motion labels. Training models directly using such coarse labels leads to noticeable spatial and temporal prediction inconsistencies. To mitigate these issues, we introduce three simple spatial and temporal regularization losses, which facilitate the self-supervised training process effectively. Experimental results demonstrate the significant superiority of our approach over the state-of-the-art self-supervised methods.
Paper Structure (17 sections, 9 equations, 5 figures, 5 tables)

This paper contains 17 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Performance comparison over static, slow, and fast speed levels between self-supervised PillarMotion Luo2021SelfSupervisedPM and our approach on the nuScenes dataset. The dashed line represents the performance of fully-supervised MotionNet Wu2020MotionNetJP. Our proposed self-supervised approach outperforms the PillarMotion which uses additional image data by a large margin and substantially narrows the performance gap with fully-supervised results.
  • Figure 2: Overview of the proposed approach. Without ground truth labels, we first generate pseudo labels by matching. We then introduce cluster, forward, and backward regularization losses to facilitate self-supervised motion learning.
  • Figure 3: Prediction results with inconsistency. (a) Ground truth and pseudo labels. (b) Predictions from the same object are inconsistent. (c) Predictions for consecutive future timestamps are inconsistent. (d) Forward and backward predictions (e.g., $M^{T\rightarrow T+1}$ and $M^{T\rightarrow T-1}$) are inconsistent. Inconsistency regions are highlighted by red circles. The blue arrow denotes the future displacement (motion).
  • Figure 4: Forward-backward divergence of the training set. We analyze the prediction errors of all samples in the training set and their corresponding forward-backward divergences. The Forward-backward divergence is positively correlated with the prediction error.
  • Figure 5: Qualitative results of the proposed self-supervised approach. The future displacements ($M^{T\rightarrow T+5}$) are depicted using the color wheel representation. (Zoom in for the best view)