EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support

Ramy Battrawy; René Schuster; Didier Stricker

EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support

Ramy Battrawy, René Schuster, Didier Stricker

TL;DR

EgoFlowNet tackles non-rigid scene flow estimation from LiDAR point clouds under weak supervision by avoiding clustering and object-level rigidity assumptions. It jointly predicts a point-level foreground/background segmentation mask $M_{fg}$ and flows for both ego-motion and scene flow across a four-scale, coarse-to-fine pipeline with a shared cost volume and hybrid features, enabling robust non-rigid motion estimation. The ego-motion branch uses correspondences and the Kabsch algorithm to estimate $(\hat{R},\hat{t})$, while the scene-flow branch performs multi-stage refinement with dual attention to produce $\hat{S}_k$, and BG points have their flow merged via $M^P_{bg}$ using the predicted ego-motion. The method achieves state-of-the-art performance on KITTI datasets in the presence of ground points, offering strong accuracy, efficiency (~$140\mathrm{ms}$ per frame on a Titan V), and robustness to occlusions, marking a substantive advance in clustering-free, point-level scene flow for autonomous driving.

Abstract

Recent weakly-supervised methods for scene flow estimation from LiDAR point clouds are limited to explicit reasoning on object-level. These methods perform multiple iterative optimizations for each rigid object, which makes them vulnerable to clustering robustness. In this paper, we propose our EgoFlowNet - a point-level scene flow estimation network trained in a weakly-supervised manner and without object-based abstraction. Our approach predicts a binary segmentation mask that implicitly drives two parallel branches for ego-motion and scene flow. Unlike previous methods, we provide both branches with all input points and carefully integrate the binary mask into the feature extraction and losses. We also use a shared cost volume with local refinement that is updated at multiple scales without explicit clustering or rigidity assumptions. On realistic KITTI scenes, we show that our EgoFlowNet performs better than state-of-the-art methods in the presence of ground surface points.

EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support

TL;DR

and flows for both ego-motion and scene flow across a four-scale, coarse-to-fine pipeline with a shared cost volume and hybrid features, enabling robust non-rigid motion estimation. The ego-motion branch uses correspondences and the Kabsch algorithm to estimate

, while the scene-flow branch performs multi-stage refinement with dual attention to produce

, and BG points have their flow merged via

using the predicted ego-motion. The method achieves state-of-the-art performance on KITTI datasets in the presence of ground points, offering strong accuracy, efficiency (~

per frame on a Titan V), and robustness to occlusions, marking a substantive advance in clustering-free, point-level scene flow for autonomous driving.

Abstract

Paper Structure (22 sections, 8 equations, 6 figures, 6 tables)

This paper contains 22 sections, 8 equations, 6 figures, 6 tables.

Introduction
Related Work
Network Design
Feature Extraction
Segmentation Head
Shared Cost Volume
Ego-Motion branch
Scene Flow branch
Scene Flow of $\mathbf{BG}$ Points
Loss Function
Experiments
Evaluation Metrics
Data Sets and Preprocessing
Comparison to State-of-the-Art
Ablation Study
...and 7 more sections

Figures (6)

Figure 1: Our EgoFlowNet operates non-rigidly at the point-level and shows high accuracy for regions of varying local density (e.g, the red, blue, and green rectangles).
Figure 2: Our EgoFlowNet architecture predicts a binary segmentation masks ($M^P_{fg}$ and $M^Q_{fg}$) for foreground points ($FG$) and ($M^P_{bg}$ and $M^Q_{bg}$) for background points ($BG$). We use the binary mask to jointly estimate ego-motion and scene flow at the point-level. For this, we extract hybrid features ($HF^P$ and $HF^Q$) and hierarchically refine our point-wise scene flow.
Figure 3: Three examples from $\mathrm{lidarKITTI}$geiger2012we show the qualitative results of our EgoFlowNet. For visual enhancement only, we show the RGB images of each scene. We visualize the predicted binary mask, where $BG$ and $FG$ points are encoded by gray and orange colors, respectively. The error map for each scene (third row) shows the end-point error in meters and is colored according to the map shown in the last row. Our EgoFlowNet shows low errors (dark blue) over a wide area in each scene, including $FG$ and $BG$ points.
Figure I: Three examples from $\mathrm{lidarKITTI}$geiger2012we show the cases where cars are not fully sensed in the second frame $Q$ and our scene flow prediction partially fails. For visual enhancement only, we show the RGB images of each scene. We visualize the predicted binary mask, where $BG$ and $FG$ points are encoded by gray and orange or cyan colors, respectively. The error map for each scene (third row) shows the end-point error in meters and is colored according to the map shown in the last row.
Figure II: Six examples from $\mathrm{stereoKITTI}$menze2015object show the qualitative results of our EgoFlowNet.
...and 1 more figures

EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support

TL;DR

Abstract

EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support

Authors

TL;DR

Abstract

Table of Contents

Figures (6)