Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency
Patrik Vacek, David Hurych, Karel Zimmermann, Patrick Perez, Tomas Svoboda
TL;DR
This work tackles self-supervised 3D scene flow from point clouds by addressing degeneracies arising from proximity-based rigid clustering. It introduces two model-agnostic losses: $L_{\textit{surf}}$, a surface-aware smoothness term, and $L_{\textit{cyc}}$, a forward-backward cyclic smoothness term, which yield larger and more coherent rigid clusters and improved temporal consistency. The method achieves state-of-the-art performance across four driving datasets and is validated on both stereo KITTI and LiDAR-based benchmarks, while remaining compatible with leading architectures such as SCOOP and Neural Prior. Its practical impact lies in providing robust, plug-and-play regularization for self-supervised 3D scene flow, enabling more reliable perception for autonomous driving and robotics.
Abstract
Learning without supervision how to predict 3D scene flows from point clouds is essential to many perception systems. We propose a novel learning framework for this task which improves the necessary regularization. Relying on the assumption that scene elements are mostly rigid, current smoothness losses are built on the definition of "rigid clusters" in the input point clouds. The definition of these clusters is challenging and has a significant impact on the quality of predicted flows. We introduce two new consistency losses that enlarge clusters while preventing them from spreading over distinct objects. In particular, we enforce \emph{temporal} consistency with a forward-backward cyclic loss and \emph{spatial} consistency by considering surface orientation similarity in addition to spatial proximity. The proposed losses are model-independent and can thus be used in a plug-and-play fashion to significantly improve the performance of existing models, as demonstrated on two most widely used architectures. We also showcase the effectiveness and generalization capability of our framework on four standard sensor-unique driving datasets, achieving state-of-the-art performance in 3D scene flow estimation. Our codes are available on https://github.com/ctu-vras/sac-flow.
