Table of Contents
Fetching ...

Let-It-Flow: Simultaneous Optimization of 3D Flow and Object Clustering

Patrik Vacek, David Hurych, Tomáš Svoboda, Karel Zimmermann

TL;DR

Let-It-Flow presents a self-supervised approach for 3D scene flow that jointly optimizes per-point motion and object clustering in LiDAR sequences. It introduces two cluster types—overlapping soft clusters and hard growing clusters—and two rigidity losses plus a distance loss to enforce physically plausible motion across multiple independently moving objects. By dropping neural-prior regularization and using an optimization-based framework, it achieves state-of-the-art results on Argoverse2, Waymo, and StereoKITTI, with notable improvements for pedestrians and cyclists. The method offers a training-data-free alternative that scales to complex dynamic scenes while remaining GPU-friendly through overlapping neighborhood computations.

Abstract

We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences, which is crucial to various tasks like trajectory prediction or instance segmentation. In the absence of ground truth scene flow labels, contemporary approaches concentrate on deducing optimizing flow across sequential pairs of point clouds by incorporating structure based regularization on flow and object rigidity. The rigid objects are estimated by a variety of 3D spatial clustering methods. While state-of-the-art methods successfully capture overall scene motion using the Neural Prior structure, they encounter challenges in discerning multi-object motions. We identified the structural constraints and the use of large and strict rigid clusters as the main pitfall of the current approaches and we propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters representation. Flow is then jointly estimated with progressively growing non-overlapping rigid clusters together with fixed size overlapping soft clusters. We evaluate our method on multiple datasets with LiDAR point clouds, demonstrating the superior performance over the self-supervised baselines reaching new state of the art results. Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other which includes pedestrians, cyclists and other vulnerable road users. Our codes are publicly available on https://github.com/ctu-vras/let-it-flow.

Let-It-Flow: Simultaneous Optimization of 3D Flow and Object Clustering

TL;DR

Let-It-Flow presents a self-supervised approach for 3D scene flow that jointly optimizes per-point motion and object clustering in LiDAR sequences. It introduces two cluster types—overlapping soft clusters and hard growing clusters—and two rigidity losses plus a distance loss to enforce physically plausible motion across multiple independently moving objects. By dropping neural-prior regularization and using an optimization-based framework, it achieves state-of-the-art results on Argoverse2, Waymo, and StereoKITTI, with notable improvements for pedestrians and cyclists. The method offers a training-data-free alternative that scales to complex dynamic scenes while remaining GPU-friendly through overlapping neighborhood computations.

Abstract

We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences, which is crucial to various tasks like trajectory prediction or instance segmentation. In the absence of ground truth scene flow labels, contemporary approaches concentrate on deducing optimizing flow across sequential pairs of point clouds by incorporating structure based regularization on flow and object rigidity. The rigid objects are estimated by a variety of 3D spatial clustering methods. While state-of-the-art methods successfully capture overall scene motion using the Neural Prior structure, they encounter challenges in discerning multi-object motions. We identified the structural constraints and the use of large and strict rigid clusters as the main pitfall of the current approaches and we propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters representation. Flow is then jointly estimated with progressively growing non-overlapping rigid clusters together with fixed size overlapping soft clusters. We evaluate our method on multiple datasets with LiDAR point clouds, demonstrating the superior performance over the self-supervised baselines reaching new state of the art results. Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other which includes pedestrians, cyclists and other vulnerable road users. Our codes are publicly available on https://github.com/ctu-vras/let-it-flow.
Paper Structure (23 sections, 7 equations, 5 figures, 5 tables)

This paper contains 23 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Performance comparison of the proposed method with self-supervised competitors on Argoverse2 Dataset Argoverse. Our method is able to distinguish the different motion patterns and separate objects, while other methods Chodosh2023reevaluatingvidanapathirana2023mbnsfLi2023Fastli2021neural tend to under-segment objects and fit incorrect rigid motion. The qualitative example comes from Waymo ScalableWaymo2022 dataset.
  • Figure 2: Outline of the proposed losses: The left image shows a point cloud, represented by black crosses, containing two vertical objects; the silhouettes were added for visualization. No spatial clustering would segment the objects correctly; therefore, any independent flow estimation will be strongly biased by the incorrect clustering. In contrast, we cover the point cloud by (i) non-overlapping hard rigid clusters and (ii) overlapping soft rigid clusters. The right image demonstrates losses, visualized by springs, used for the flow estimation. The resulting flow is used to merge the hard clusters in the spatio-temporal domain. The procedure is repeated until convergence. The resulting hard rigid clusters deliver rigid object segmentation.
  • Figure 3: On the left - clusters from method vidanapathirana2023mbnsf, where each color denotes a single DBSCAN cluster meant for a single rigid motion fit with outlier rejection. On the right - our over-segmentation Euclidean clustering (each cluster denoted by color) and soft rigidity connections that can overflow into more objects but can be rejected as outliers. We observe that a building with three pedestrians clustered can only allow for a single rigid motion (stationary from more building points) and reject the pedestrian motion in the loss for the previous method. On the other hand, motions on the right are treated separately for each object, with the exception of two pedestrians walking right next to each other.
  • Figure 4: The hard clustering parameter Epsilon which denotes the grouping radius for Euclidean clustering compared to dynamic end-point-error.
  • Figure 5: Speed performance comparison between the optimization-based methods with Neural Prior regularizer. We show ratio between the performance and time spent of iterations. The Fast Neural Prior Li2023Fast converges fastest and Our proposed loss achieves the best performance. All measured with accelerator proposed in Li2023Fast.