3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
Chaokang Jiang, Guangming Wang, Jiuming Liu, Hesheng Wang, Zhuang Ma, Zhenqiang Liu, Zhujin Liang, Yi Shan, Dalong Du
TL;DR
The paper addresses the data scarcity and domain gap in LiDAR-based 3D scene flow by proposing 3DSFLabelling, a pseudo-auto-labelling framework that decomposes scene motion into global ego-motion and object-level rigid motions using differentiable anchor-box parameters. It introduces a motion-parameter optimization module and a global-local data augmentation pipeline to generate abundant, realistic pseudo labels (SF = PC_T^* - PC_S) without manual annotations, enabling supervised networks to learn from real-world data. Across KITTI, nuScenes, and Argoverse, the approach yields state-of-the-art results and strong cross-domain generalization, with dramatic reductions in EPE3D and improvements in Acc3DS/Acc3DR even when trained on unlabeled data. This plug-and-play framework reduces annotation costs and improves robustness of 3D scene flow estimation for autonomous driving applications.
Abstract
Learning 3D scene flow from LiDAR point clouds presents significant difficulties, including poor generalization from synthetic datasets to real scenes, scarcity of real-world 3D labels, and poor performance on real sparse LiDAR point clouds. We present a novel approach from the perspective of auto-labelling, aiming to generate a large number of 3D scene flow pseudo labels for real-world LiDAR point clouds. Specifically, we employ the assumption of rigid body motion to simulate potential object-level rigid movements in autonomous driving scenarios. By updating different motion attributes for multiple anchor boxes, the rigid motion decomposition is obtained for the whole scene. Furthermore, we developed a novel 3D scene flow data augmentation method for global and local motion. By perfectly synthesizing target point clouds based on augmented motion parameters, we easily obtain lots of 3D scene flow labels in point clouds highly consistent with real scenarios. On multiple real-world datasets including LiDAR KITTI, nuScenes, and Argoverse, our method outperforms all previous supervised and unsupervised methods without requiring manual labelling. Impressively, our method achieves a tenfold reduction in EPE3D metric on the LiDAR KITTI dataset, reducing it from $0.190m$ to a mere $0.008m$ error.
