Table of Contents
Fetching ...

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

Chaokang Jiang, Guangming Wang, Jiuming Liu, Hesheng Wang, Zhuang Ma, Zhenqiang Liu, Zhujin Liang, Yi Shan, Dalong Du

TL;DR

The paper addresses the data scarcity and domain gap in LiDAR-based 3D scene flow by proposing 3DSFLabelling, a pseudo-auto-labelling framework that decomposes scene motion into global ego-motion and object-level rigid motions using differentiable anchor-box parameters. It introduces a motion-parameter optimization module and a global-local data augmentation pipeline to generate abundant, realistic pseudo labels (SF = PC_T^* - PC_S) without manual annotations, enabling supervised networks to learn from real-world data. Across KITTI, nuScenes, and Argoverse, the approach yields state-of-the-art results and strong cross-domain generalization, with dramatic reductions in EPE3D and improvements in Acc3DS/Acc3DR even when trained on unlabeled data. This plug-and-play framework reduces annotation costs and improves robustness of 3D scene flow estimation for autonomous driving applications.

Abstract

Learning 3D scene flow from LiDAR point clouds presents significant difficulties, including poor generalization from synthetic datasets to real scenes, scarcity of real-world 3D labels, and poor performance on real sparse LiDAR point clouds. We present a novel approach from the perspective of auto-labelling, aiming to generate a large number of 3D scene flow pseudo labels for real-world LiDAR point clouds. Specifically, we employ the assumption of rigid body motion to simulate potential object-level rigid movements in autonomous driving scenarios. By updating different motion attributes for multiple anchor boxes, the rigid motion decomposition is obtained for the whole scene. Furthermore, we developed a novel 3D scene flow data augmentation method for global and local motion. By perfectly synthesizing target point clouds based on augmented motion parameters, we easily obtain lots of 3D scene flow labels in point clouds highly consistent with real scenarios. On multiple real-world datasets including LiDAR KITTI, nuScenes, and Argoverse, our method outperforms all previous supervised and unsupervised methods without requiring manual labelling. Impressively, our method achieves a tenfold reduction in EPE3D metric on the LiDAR KITTI dataset, reducing it from $0.190m$ to a mere $0.008m$ error.

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

TL;DR

The paper addresses the data scarcity and domain gap in LiDAR-based 3D scene flow by proposing 3DSFLabelling, a pseudo-auto-labelling framework that decomposes scene motion into global ego-motion and object-level rigid motions using differentiable anchor-box parameters. It introduces a motion-parameter optimization module and a global-local data augmentation pipeline to generate abundant, realistic pseudo labels (SF = PC_T^* - PC_S) without manual annotations, enabling supervised networks to learn from real-world data. Across KITTI, nuScenes, and Argoverse, the approach yields state-of-the-art results and strong cross-domain generalization, with dramatic reductions in EPE3D and improvements in Acc3DS/Acc3DR even when trained on unlabeled data. This plug-and-play framework reduces annotation costs and improves robustness of 3D scene flow estimation for autonomous driving applications.

Abstract

Learning 3D scene flow from LiDAR point clouds presents significant difficulties, including poor generalization from synthetic datasets to real scenes, scarcity of real-world 3D labels, and poor performance on real sparse LiDAR point clouds. We present a novel approach from the perspective of auto-labelling, aiming to generate a large number of 3D scene flow pseudo labels for real-world LiDAR point clouds. Specifically, we employ the assumption of rigid body motion to simulate potential object-level rigid movements in autonomous driving scenarios. By updating different motion attributes for multiple anchor boxes, the rigid motion decomposition is obtained for the whole scene. Furthermore, we developed a novel 3D scene flow data augmentation method for global and local motion. By perfectly synthesizing target point clouds based on augmented motion parameters, we easily obtain lots of 3D scene flow labels in point clouds highly consistent with real scenarios. On multiple real-world datasets including LiDAR KITTI, nuScenes, and Argoverse, our method outperforms all previous supervised and unsupervised methods without requiring manual labelling. Impressively, our method achieves a tenfold reduction in EPE3D metric on the LiDAR KITTI dataset, reducing it from to a mere error.
Paper Structure (17 sections, 8 equations, 6 figures, 5 tables)

This paper contains 17 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The proposed 3D scene flow pseudo-auto-labelling framework. Given point clouds and initial bounding boxes, both global and local motion parameters are iteratively optimized. Diverse motion patterns are augmented by randomly adjusting these motion parameters, thereby creating a diverse and realistic set of motion labels for the training of 3D scene flow estimation models.
  • Figure 2: The accuracy improvement after integrating our proposed pseudo-auto-labelling method. Models trained on synthetic data performance poorly in 3D scene flow estimation for LiDAR-based autonomous driving. Our proposed 3D pseudo-auto-labelling method improves accuracy, reaching an EPE3D below $2cm$ across datasets sfkitti2argoversenuscenes.
  • Figure 3: The proposed learning framework of pseudo 3D scene flow automatic labelling. The input comprises 3D anchor boxes, a pair of point clouds, and their corresponding coarse normal vectors. The optimization of motion parameters primarily updates the bounding box parameters, global motion parameters, local motion parameters, and the motion probability of the boxes. The attribute parameters for boxes are updated through backward optimization from six objective functions. Once optimized, the motion parameters simulate various types of motion using a global-local data augmentation module. A single source frame point cloud, along with the augmented motion parameters, produces diverse 3D scene flow labels. These labels serve to guide the supervised neural network to learn point-wise motion.
  • Figure 4: The proposed pseudo label generation module. With the augmented motion probability $P^*_M$, bounding boxes are categorized into dynamic and static types. Using global and local motion parameters, the $PC_S$ is warped to the target point cloud $PC^*_T$. Finally, pseudo 3D scene flow labels $SF$ are derived from the correspondence between $PC^*_T$ and $PC_S$. $K_{box}$ represents the number of boxes.
  • Figure 5: Registration visualization results of our method (GMSF zhang2023gmsf+3DSFlabelling) and baselines on the LiDAR KITTI and Argoverse datasets lidarKITTIargoverse. The estimated target point cloud $PC_{sw}$ is derived from warping the source point cloud $PC_{S}$ to the target point cloud via 3D scene flow. The larger the overlap between $PC_{sw}$ (blue) and the target point cloud $PC_T$ (green), the higher the predicted accuracy of the scene flow. Local areas are zoomed in for better visibility. Our 3D scene flow estimation notably improves performance.
  • ...and 1 more figures