Table of Contents
Fetching ...

Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency

Patrik Vacek, David Hurych, Karel Zimmermann, Patrick Perez, Tomas Svoboda

TL;DR

This work tackles self-supervised 3D scene flow from point clouds by addressing degeneracies arising from proximity-based rigid clustering. It introduces two model-agnostic losses: $L_{\textit{surf}}$, a surface-aware smoothness term, and $L_{\textit{cyc}}$, a forward-backward cyclic smoothness term, which yield larger and more coherent rigid clusters and improved temporal consistency. The method achieves state-of-the-art performance across four driving datasets and is validated on both stereo KITTI and LiDAR-based benchmarks, while remaining compatible with leading architectures such as SCOOP and Neural Prior. Its practical impact lies in providing robust, plug-and-play regularization for self-supervised 3D scene flow, enabling more reliable perception for autonomous driving and robotics.

Abstract

Learning without supervision how to predict 3D scene flows from point clouds is essential to many perception systems. We propose a novel learning framework for this task which improves the necessary regularization. Relying on the assumption that scene elements are mostly rigid, current smoothness losses are built on the definition of "rigid clusters" in the input point clouds. The definition of these clusters is challenging and has a significant impact on the quality of predicted flows. We introduce two new consistency losses that enlarge clusters while preventing them from spreading over distinct objects. In particular, we enforce \emph{temporal} consistency with a forward-backward cyclic loss and \emph{spatial} consistency by considering surface orientation similarity in addition to spatial proximity. The proposed losses are model-independent and can thus be used in a plug-and-play fashion to significantly improve the performance of existing models, as demonstrated on two most widely used architectures. We also showcase the effectiveness and generalization capability of our framework on four standard sensor-unique driving datasets, achieving state-of-the-art performance in 3D scene flow estimation. Our codes are available on https://github.com/ctu-vras/sac-flow.

Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency

TL;DR

This work tackles self-supervised 3D scene flow from point clouds by addressing degeneracies arising from proximity-based rigid clustering. It introduces two model-agnostic losses: , a surface-aware smoothness term, and , a forward-backward cyclic smoothness term, which yield larger and more coherent rigid clusters and improved temporal consistency. The method achieves state-of-the-art performance across four driving datasets and is validated on both stereo KITTI and LiDAR-based benchmarks, while remaining compatible with leading architectures such as SCOOP and Neural Prior. Its practical impact lies in providing robust, plug-and-play regularization for self-supervised 3D scene flow, enabling more reliable perception for autonomous driving and robotics.

Abstract

Learning without supervision how to predict 3D scene flows from point clouds is essential to many perception systems. We propose a novel learning framework for this task which improves the necessary regularization. Relying on the assumption that scene elements are mostly rigid, current smoothness losses are built on the definition of "rigid clusters" in the input point clouds. The definition of these clusters is challenging and has a significant impact on the quality of predicted flows. We introduce two new consistency losses that enlarge clusters while preventing them from spreading over distinct objects. In particular, we enforce \emph{temporal} consistency with a forward-backward cyclic loss and \emph{spatial} consistency by considering surface orientation similarity in addition to spatial proximity. The proposed losses are model-independent and can thus be used in a plug-and-play fashion to significantly improve the performance of existing models, as demonstrated on two most widely used architectures. We also showcase the effectiveness and generalization capability of our framework on four standard sensor-unique driving datasets, achieving state-of-the-art performance in 3D scene flow estimation. Our codes are available on https://github.com/ctu-vras/sac-flow.
Paper Structure (22 sections, 9 equations, 6 figures, 4 tables)

This paper contains 22 sections, 9 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Proposed self-supervised scene flow framework. Self-supervised scene flow prediction is usually trained with losses that enforce the alignment of source and target point clouds and the smoothness of the flow ($\mathcal{L}_{\textit{dist}}$ and $\mathcal{L}_{\textit{smooth}}$ respectively). We improve the latter by introducing a surface-aware loss, $\mathcal{L}_{\textit{surf}}$, and a cyclic temporal consistency one, $\mathcal{L}_{\textit{cyc}}$. The proposed framework outperforms the state of the art on all tested datasets.
  • Figure 2: Illustration of baseline (top) and proposed (bottom) losses to train self-supervised scene flows. (Top-left) Classic approaches first enforce the alignment of the two point clouds, irrespective of their structure. This results in wrong correspondences and incorrect flows. (Top-right) To improve results, local smoothness enforces motion consistency within rigid clusters. Defined only on the proximity of points, such clusters can typically be too small ($R_2$) or connect unrelated rigid bodies ($R_1$), which limits the efficiency of the smoothness loss. (Bottom-left) By taking into account surface orientation similarity in addition to spatial proximity in the definition of clusters, we mitigate the latter issue. (Bottom-right) We also propose a new cyclic consistency loss that enforces two-way time consistency between the source and target point clouds, based on significantly larger and more accurate rigid clusters. In each figure, flow vectors are colored by rigid clusters ('r.c.'), e.g., there are all colored differently when using only $\mathcal{L}_{\textit{dist}}$.
  • Figure 3: Cyclic Smoothness loss. The new loss $\mathcal{L}_{\textit{cyc}}$ enforces the same flow (dashed green arrow) over the rigid cluster $R_{\textit{cyc}}(\mathbf{x})$ (light blue set) defined as follows: Given the source point $\mathbf{x}$ and its best match $\mathbf{y}^\star_\mathbf{x}$ in target point cloud according to flow $F$, we construct its $k$-nearest neighborhood $N_Y^k(\mathbf{y}_\mathbf{x}^\star)$ (light red set); Any source point $\mathbf{r}$ whose flow $\mathbf{r}+\mathbf{f}_{\mathbf{r}}$ sends it there is included in the rigid cluster $R_{\textit{cyc}}(\mathbf{x})$. While the proposed $\mathcal{L}_{\textit{cyc}}$ (left) explicitly detects the rigid object as a compact cluster via normals similarity in the target point cloud and then propagates the knowledge directly to the source point cloud by enforcing rigid flow, the baseline Cycle Consistency Mittal_2020_CVPR$\mathcal{L}_{cycle}$ (right) implicitly detects the rigid object by running the flow prediction in green point cloud ($\mathbf{x} + \mathbf{f_x}$) in the backward direction and then enforces the flow (blue arrows) to get back into its source.
  • Figure 4: Example of improvements brought by proposed framework on real LiDAR data. In this scene from the Argoverse dataset, we show the per-point flow estimation error encoded by color on a logarithmic scale. While the Neural Prior li2021neural baseline on the left fails to produce consistent flows along the majority of the vehicle body (orange ellipse), the addition of our proposed losses corrects the full body rigidity. The same applies to the pole (green ellipse) and the wall in the back (red ellipse).
  • Figure 5: Ablation of normal estimation on KITTIt. Influence on the performance of the number of neighboring points used to compute surface normals. The best performance for the SCOOP model is obtained with $k_n=4$, and further increase of neighborhood diminishes the performance.
  • ...and 1 more figures