Table of Contents
Fetching ...

FlowMamba: Learning Point Cloud Scene Flow with Global Motion Propagation

Min Lin, Gangwei Xu, Yun Wang, Xianqi Wang, Xin Yang

TL;DR

FlowMamba addresses ill-posed regions in point-cloud scene flow by introducing global motion propagation through an Iterative State Space Model Update (ISU) and mitigating irregular point-cloud ordering with a Feature-Induced Ordering (FIO). The approach combines multi-scale feature extraction with a coarse-to-fine ISU-driven update and a gated fusion mechanism, achieving state-of-the-art End-Point Error ($EPE3D$) reductions on FlyingThings3D and KITTI, including millimeter-level accuracy. Importantly, the ISU is shown to be plug-and-play, improving existing iterative methods when substituted for GRU-based components. Overall, FlowMamba offers a scalable, efficient solution for robust point-cloud motion estimation with strong generalization to real-world data and occlusion challenges, making it impactful for autonomous driving and robotic perception.

Abstract

Scene flow methods based on deep learning have achieved impressive performance. However, current top-performing methods still struggle with ill-posed regions, such as extensive flat regions or occlusions, due to insufficient local evidence. In this paper, we propose a novel global-aware scene flow estimation network with global motion propagation, named FlowMamba. The core idea of FlowMamba is a novel Iterative Unit based on the State Space Model (ISU), which first propagates global motion patterns and then adaptively integrates the global motion information with previously hidden states. As the irregular nature of point clouds limits the performance of ISU in global motion propagation, we propose a feature-induced ordering strategy (FIO). The FIO leverages semantic-related and motion-related features to order points into a sequence characterized by spatial continuity. Extensive experiments demonstrate the effectiveness of FlowMamba, with 21.9\% and 20.5\% EPE3D reduction from the best published results on FlyingThings3D and KITTI datasets. Specifically, our FlowMamba is the first method to achieve millimeter-level prediction accuracy in FlyingThings3D and KITTI. Furthermore, the proposed ISU can be seamlessly embedded into existing iterative networks as a plug-and-play module, improving their estimation accuracy significantly.

FlowMamba: Learning Point Cloud Scene Flow with Global Motion Propagation

TL;DR

FlowMamba addresses ill-posed regions in point-cloud scene flow by introducing global motion propagation through an Iterative State Space Model Update (ISU) and mitigating irregular point-cloud ordering with a Feature-Induced Ordering (FIO). The approach combines multi-scale feature extraction with a coarse-to-fine ISU-driven update and a gated fusion mechanism, achieving state-of-the-art End-Point Error () reductions on FlyingThings3D and KITTI, including millimeter-level accuracy. Importantly, the ISU is shown to be plug-and-play, improving existing iterative methods when substituted for GRU-based components. Overall, FlowMamba offers a scalable, efficient solution for robust point-cloud motion estimation with strong generalization to real-world data and occlusion challenges, making it impactful for autonomous driving and robotic perception.

Abstract

Scene flow methods based on deep learning have achieved impressive performance. However, current top-performing methods still struggle with ill-posed regions, such as extensive flat regions or occlusions, due to insufficient local evidence. In this paper, we propose a novel global-aware scene flow estimation network with global motion propagation, named FlowMamba. The core idea of FlowMamba is a novel Iterative Unit based on the State Space Model (ISU), which first propagates global motion patterns and then adaptively integrates the global motion information with previously hidden states. As the irregular nature of point clouds limits the performance of ISU in global motion propagation, we propose a feature-induced ordering strategy (FIO). The FIO leverages semantic-related and motion-related features to order points into a sequence characterized by spatial continuity. Extensive experiments demonstrate the effectiveness of FlowMamba, with 21.9\% and 20.5\% EPE3D reduction from the best published results on FlyingThings3D and KITTI datasets. Specifically, our FlowMamba is the first method to achieve millimeter-level prediction accuracy in FlyingThings3D and KITTI. Furthermore, the proposed ISU can be seamlessly embedded into existing iterative networks as a plug-and-play module, improving their estimation accuracy significantly.

Paper Structure

This paper contains 19 sections, 8 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Left: Comparison with state-of-the-art scene flow methods liu2024difflow3dcheng2023multiwang2023ihnetfu2023ptcheng2022biwang2022matters on FlyingThings3D and KITTI. Notably, we achieved millimeter-level precision on both datasets for the first time. Right: Comparison with the accuracy of each layer output on FlyingThings3D. Our FlowMamba can achieve superior results from the coarsest level. In practical applications, adjusting the number of levels and iterations allows for a trade-off between efficiency and accuracy.
  • Figure 2: Overview of our proposed FlowMamba. Feature encoders abstract the point clouds to obtain the multi-scale point feature and context feature. The correlation features can be obtained by using local cost volumes retrieved from the feature pyramid. The iterative SSM-based update module (ISU) is designed to update the hidden information and scene flow by capturing long-range dependencies and comprehensive motion patterns with global motion propagation. The feature-induced ordering (FIO) strategy is designed to construct reasonable causal dependencies in point cloud.
  • Figure 3: The architecture of proposed module. Left: Iterative SSM-based Update (ISU) Module. Middle: Feature-Induced Ordering(FIO) strategy. Right: Bi-directional Mamba block (Bi-Mamba).
  • Figure 4: Qualitative results on the test set of KITTI. It shows that providing global motion propagation improves performance in areas with ambiguous geometric characteristics, such as embankments, roadside grassy areas, and some slender structures (curbs or tracks). Blue, green and red points respectively indicate the first frame $P_t$, accurately estimated in $P_t$ and inaccurately estimated in $P_t$(measured by Acc3DS).