DeFlow: Decoder of Scene Flow Network in Autonomous Driving
Qingwen Zhang, Yi Yang, Heng Fang, Ruoyu Geng, Patric Jensfelt
TL;DR
DeFlow addresses real-time scene flow estimation on large-scale LiDAR point clouds by augmenting voxel-based encoders with a GRU-based decoder that reconstructs point-level features. A novel three-way, motion-aware loss balances static and dynamic points, improving accuracy for dynamic regions while maintaining efficiency. Empirical results on Argoverse 2 demonstrate state-of-the-art End Point Error and dynamic point metrics, with four GRU refinement iterations offering the best trade-off between performance and resources. The approach enables real-time deployment and sets the stage for self-supervised and multi-sensor fusion extensions in autonomous driving.
Abstract
Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving. Many networks with large-scale point clouds as input use voxelization to create a pseudo-image for real-time running. However, the voxelization process often results in the loss of point-specific features. This gives rise to a challenge in recovering those features for scene flow tasks. Our paper introduces DeFlow which enables a transition from voxel-based features to point features using Gated Recurrent Unit (GRU) refinement. To further enhance scene flow estimation performance, we formulate a novel loss function that accounts for the data imbalance between static and dynamic points. Evaluations on the Argoverse 2 scene flow task reveal that DeFlow achieves state-of-the-art results on large-scale point cloud data, demonstrating that our network has better performance and efficiency compared to others. The code is open-sourced at https://github.com/KTH-RPL/deflow.
