Table of Contents
Fetching ...

RMS-FlowNet++: Efficient and Robust Multi-Scale Scene Flow Estimation for Large-Scale Point Clouds

Ramy Battrawy, René Schuster, Didier Stricker

TL;DR

RMS-FlowNet++ tackles the challenge of estimating 3D scene flow on dense point clouds with high efficiency. It introduces a Patch-to-Dilated-Patch flow embedding and a Random-Sampling enabled hierarchical design that reduces the size of the correspondence set while maintaining accuracy, enabling processing of tens to hundreds of thousands of points without full-resolution matching. The method demonstrates competitive accuracy and superior generalization to KITTI without fine-tuning, along with robust performance under occlusions and at long ranges up to 210 meters. This work advances scalable, accurate scene flow estimation for large-scale LiDAR data, with practical impact on autonomous driving and robust 3D motion understanding.

Abstract

The proposed RMS-FlowNet++ is a novel end-to-end learning-based architecture for accurate and efficient scene flow estimation that can operate on high-density point clouds. For hierarchical scene f low estimation, existing methods rely on expensive Farthest-Point-Sampling (FPS) to sample the scenes, must find large correspondence sets across the consecutive frames and/or must search for correspondences at a full input resolution. While this can improve the accuracy, it reduces the overall efficiency of these methods and limits their ability to handle large numbers of points due to memory requirements. In contrast to these methods, our architecture is based on an efficient design for hierarchical prediction of multi-scale scene flow. To this end, we develop a special flow embedding block that has two advantages over the current methods: First, a smaller correspondence set is used, and second, the use of Random-Sampling (RS) is possible. In addition, our architecture does not need to search for correspondences at a full input resolution. Exhibiting high accuracy, our RMS-FlowNet++ provides a faster prediction than state-of-the-art methods, avoids high memory requirements and enables efficient scene flow on dense point clouds of more than 250K points at once. Our comprehensive experiments verify the accuracy of RMS FlowNet++ on the established FlyingThings3D data set with different point cloud densities and validate our design choices. Furthermore, we demonstrate that our model has a competitive ability to generalize to the real-world scenes of the KITTI data set without fine-tuning.

RMS-FlowNet++: Efficient and Robust Multi-Scale Scene Flow Estimation for Large-Scale Point Clouds

TL;DR

RMS-FlowNet++ tackles the challenge of estimating 3D scene flow on dense point clouds with high efficiency. It introduces a Patch-to-Dilated-Patch flow embedding and a Random-Sampling enabled hierarchical design that reduces the size of the correspondence set while maintaining accuracy, enabling processing of tens to hundreds of thousands of points without full-resolution matching. The method demonstrates competitive accuracy and superior generalization to KITTI without fine-tuning, along with robust performance under occlusions and at long ranges up to 210 meters. This work advances scalable, accurate scene flow estimation for large-scale LiDAR data, with practical impact on autonomous driving and robust 3D motion understanding.

Abstract

The proposed RMS-FlowNet++ is a novel end-to-end learning-based architecture for accurate and efficient scene flow estimation that can operate on high-density point clouds. For hierarchical scene f low estimation, existing methods rely on expensive Farthest-Point-Sampling (FPS) to sample the scenes, must find large correspondence sets across the consecutive frames and/or must search for correspondences at a full input resolution. While this can improve the accuracy, it reduces the overall efficiency of these methods and limits their ability to handle large numbers of points due to memory requirements. In contrast to these methods, our architecture is based on an efficient design for hierarchical prediction of multi-scale scene flow. To this end, we develop a special flow embedding block that has two advantages over the current methods: First, a smaller correspondence set is used, and second, the use of Random-Sampling (RS) is possible. In addition, our architecture does not need to search for correspondences at a full input resolution. Exhibiting high accuracy, our RMS-FlowNet++ provides a faster prediction than state-of-the-art methods, avoids high memory requirements and enables efficient scene flow on dense point clouds of more than 250K points at once. Our comprehensive experiments verify the accuracy of RMS FlowNet++ on the established FlyingThings3D data set with different point cloud densities and validate our design choices. Furthermore, we demonstrate that our model has a competitive ability to generalize to the real-world scenes of the KITTI data set without fine-tuning.
Paper Structure (17 sections, 3 equations, 11 figures, 8 tables)

This paper contains 17 sections, 3 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Our RMS-FlowNet++ shows an accurate scene flow (Acc3DR) with a low runtime. The accuracy is tested on $\mathrm{KITTI_s}$menze2015object with $8192$ points as input and the runtime is analyzed for all methods equally on a Geforce GTX 1080 Ti.
  • Figure 2: The challenges of rs (right) compared to fps (left): Both techniques sample two consecutive scenes $P^t$ (blue) and $Q^{t+1}$ (green) into red and pink samples, respectively. Areas of low density are often not sufficiently covered by , resulting in dissimilar patterns. The patterns of the corresponding objects are much more similar when is used, making it easier to match the points.
  • Figure 3: We describe the generic pipeline of recent scene flow estimation methods. Like our previous work battrawy2022rms, our RMS-FlowNet++ estimates scene flow directly from raw point clouds and extracts features based on RandLA-Net hu2020randla. Compared to recent scene flow methods, our novel Patch-to-Dilated-Patch allows the use of along with hierarchical or coarse-to-fine refinement.
  • Figure 4: Our network design consists of feature extraction, flow embedding, warping layers, and scene flow heads, similar to our previous work RMS-FlowNet battrawy2022rms. Compared to the feature extraction module in RMS-FlowNet, which consists of fully connected layers (FC) at full input resolution, encoder and decoder modules ($a$), we omit (FC) and the decoder in our RMS-FlowNet++ $(b)$.
  • Figure 5: Our novel fe module consists of four main steps and yields the scene flow feature $sf_i^{t}$: Two maximum embedding layers based on both Euclidean and feature space followed by two attentive embedding layers. Lateral connections are also used: A Concatenation ($Concat.$) between the first two embeddings and a residual connection ($Res.~Conn.$).
  • ...and 6 more figures