Table of Contents
Fetching ...

Multi-Body Neural Scene Flow

Kavisha Vidanapathirana, Shin-Fang Chng, Xueqian Li, Simon Lucey

TL;DR

MBNSF introduces a multi-body rigidity regularizer that enforces approximate isometry within clusters of a source point cloud to induce SE(3) rigidity without explicitly estimating rigid-body transformations. By coupling this isometric-flow regularization with a continuous neural scene-flow prior, MBNSF preserves continuous motion fields and enables accurate long-term 4D trajectory predictions. The approach leverages DBSCAN clustering and a robust spectral objective to identify and preserve rigid-body relationships, improving scene flow and trajectory performance on real-world LiDAR datasets (Argoverse and Waymo) over state-of-the-art NSFP variants. It also provides practical integrations with NSFP and NTP and demonstrates favorable memory and efficiency characteristics compared to per-cluster alternatives, at the cost of increased offline optimization time. The work advances unsupervised, continuous scene flow estimation and long-term motion modeling in dynamic 3D scenes, with broad applicability to autonomous driving and dynamic scene understanding.

Abstract

The test-time optimization of scene flow - using a coordinate network as a neural prior - has gained popularity due to its simplicity, lack of dataset bias, and state-of-the-art performance. We observe, however, that although coordinate networks capture general motions by implicitly regularizing the scene flow predictions to be spatially smooth, the neural prior by itself is unable to identify the underlying multi-body rigid motions present in real-world data. To address this, we show that multi-body rigidity can be achieved without the cumbersome and brittle strategy of constraining the $SE(3)$ parameters of each rigid body as done in previous works. This is achieved by regularizing the scene flow optimization to encourage isometry in flow predictions for rigid bodies. This strategy enables multi-body rigidity in scene flow while maintaining a continuous flow field, hence allowing dense long-term scene flow integration across a sequence of point clouds. We conduct extensive experiments on real-world datasets and demonstrate that our approach outperforms the state-of-the-art in 3D scene flow and long-term point-wise 4D trajectory prediction. The code is available at: https://github.com/kavisha725/MBNSF.

Multi-Body Neural Scene Flow

TL;DR

MBNSF introduces a multi-body rigidity regularizer that enforces approximate isometry within clusters of a source point cloud to induce SE(3) rigidity without explicitly estimating rigid-body transformations. By coupling this isometric-flow regularization with a continuous neural scene-flow prior, MBNSF preserves continuous motion fields and enables accurate long-term 4D trajectory predictions. The approach leverages DBSCAN clustering and a robust spectral objective to identify and preserve rigid-body relationships, improving scene flow and trajectory performance on real-world LiDAR datasets (Argoverse and Waymo) over state-of-the-art NSFP variants. It also provides practical integrations with NSFP and NTP and demonstrates favorable memory and efficiency characteristics compared to per-cluster alternatives, at the cost of increased offline optimization time. The work advances unsupervised, continuous scene flow estimation and long-term motion modeling in dynamic 3D scenes, with broad applicability to autonomous driving and dynamic scene understanding.

Abstract

The test-time optimization of scene flow - using a coordinate network as a neural prior - has gained popularity due to its simplicity, lack of dataset bias, and state-of-the-art performance. We observe, however, that although coordinate networks capture general motions by implicitly regularizing the scene flow predictions to be spatially smooth, the neural prior by itself is unable to identify the underlying multi-body rigid motions present in real-world data. To address this, we show that multi-body rigidity can be achieved without the cumbersome and brittle strategy of constraining the parameters of each rigid body as done in previous works. This is achieved by regularizing the scene flow optimization to encourage isometry in flow predictions for rigid bodies. This strategy enables multi-body rigidity in scene flow while maintaining a continuous flow field, hence allowing dense long-term scene flow integration across a sequence of point clouds. We conduct extensive experiments on real-world datasets and demonstrate that our approach outperforms the state-of-the-art in 3D scene flow and long-term point-wise 4D trajectory prediction. The code is available at: https://github.com/kavisha725/MBNSF.
Paper Structure (47 sections, 14 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 47 sections, 14 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: 2D visualization of 3D flow predictions for points sampled from two rigid bodies. Relying solely on Chamfer distance minimization, NSFP (left) may violate multi-body rigidity due to either (1) the flow predictions for different points collapsing to a single target ($\mathbf{f_1}$), or (2) a point being assigned a target in a different body ($\mathbf{f_2}$) --- which happens to be closer due to the inconsistent sampling of the scene by LiDAR sensors. Our approach (right) encourages multi-body-rigidity and enables predicting accurate motion vectors even for points that don't have a valid target ($\mathbf{f_2}$).
  • Figure 2: Enforcing approximate-isometry in scene flow of rigid bodies. The source point cluster (blue) representing a rigid body (a signpost in this example) is projected (green) to closely match its target (red). Out of the three depicted flow vectors, $\mathbf{f}_1$ and $\mathbf{f}_2$ maintain an approximate-isometry (i.e.$d_{1,2} \approx \hat{d}_{1,2}$), and thus get a high score in $\mathbf{A}$ (i.e. high $\mathbf{A}[1,2]$). $\mathbf{f}_3$ is a noisy flow prediction that violates the approximate-isometry and gets low scores in $\mathbf{A}$ in relation to other flows (i.e. low $\mathbf{A}[i,3], \forall i \neq 3$).
  • Figure 3: Sensitivity to the size of clusters used to approximate rigid bodies. A large number of clusters corresponds to a small average cluster size, which leads to single-point clusters at the extremity (right). A small number of clusters implies large cluster sizes, which combine multiple rigid bodies into one cluster, resulting in the whole point cloud being a single cluster at the extremity (left). The horizontal dashed line shows the baseline NSFP performance (without regularization). The x-axis is in log-scale.
  • Figure 4: Variation of scene flow accuracy $Acc_{.05}$ (\ref{['fig:sup_dthr_ablation_accuracy']}) and optimization time (\ref{['fig:sup_dthr_ablation_efficiency']}) with variation in the distance threshold $d_{thr}$ (in Eq. \ref{['eq:spatial_consistency']}). We select the optimal value $d_{thr} = 0.03 m$ (vertical gray dashed line) for our experiments. The x-axis is in log-scale for both plots.
  • Figure 5: Visualization of projecting the source cloud (yellow) to the target (blue), which is 25 frames ($2.5\,s$) apart, using forward Euler integration of scene flow. The second row is a zoom-in of the green box in the first row. Note the motion and shape of the purple and red cars - in this example, the red car overtakes the purple car when moving from source to target. \ref{['fig:sc_viz_before']}: there is a large motion between the source and target as the ego-vehicle turns at an intersection. \ref{['fig:sc_viz_nsfp']}: NSFP has roughly aligned the motions of all points, but the shapes of rigid bodies are now deformed. Note how the purple car (in the second column) is deformed and no longer looks like a car. \ref{['fig:sc_viz_ours']}: MBNSF (Ours) has aligned the motions while preserving the shapes of all rigid bodies.
  • ...and 6 more figures