Table of Contents
Fetching ...

Low Latency Instance Segmentation by Continuous Clustering for LiDAR Sensors

Andreas Reich, Mirko Maehlisch

TL;DR

This work tackles the challenge of low-latency LiDAR instance segmentation by introducing continuous clustering, which processes a LiDAR range image as an infinitely expanding stream and outputs complete object instances online during rotation. The method employs a two-layer data structure (point trees at the bottom and a high-level graph on top) to incrementally connect nearby points within a threshold $d_T$, and a cluster-generation step that performs connected-component labeling as new columns arrive, publishing clusters as soon as $\varphi_{\text{cont, rear}} > \varphi_{\text{finished},v}$. Key contributions include the real-time, column-by-column processing pipeline, the ability to handle multi-axis and solid-state LiDARs, and two real-time heuristics to boost throughput without sacrificing core correctness. Empirical results on SemanticKitti show an average latency of $5~\mathrm{ms}$ (with $\sigma = 8~\mathrm{ms}$) on a 128-laser Velodyne in urban scenes, demonstrating substantial latency reduction while maintaining competitive segmentation performance; the approach also enables a larger field of view for challenging occluded scenarios, with potential benefits for downstream tasks such as tracking and SLAM.

Abstract

Low-latency instance segmentation of LiDAR point clouds is crucial in real-world applications because it serves as an initial and frequently-used building block in a robot's perception pipeline, where every task adds further delay. Particularly in dynamic environments, this total delay can result in significant positional offsets of dynamic objects, as seen in highway scenarios. To address this issue, we employ a new technique, which we call continuous clustering. Unlike most existing clustering approaches, which use a full revolution of the LiDAR sensor, we process the data stream in a continuous and seamless fashion. Our approach does not rely on the concept of complete or partial sensor rotations with multiple discrete range images; instead, it views the range image as a single and infinitely horizontally growing entity. Each new column of this continuous range image is processed as soon it is available. Obstacle points are clustered to existing instances in real-time and it is checked at a high-frequency which instances are completed in order to publish them without waiting for the completion of the revolution or some other integration period. In the case of rotating sensors, no problematic discontinuities between the points of the end and the start of a scan are observed. In this work we describe the two-layered data structure and the corresponding algorithm for continuous clustering. It is able to achieve an average latency of just 5 ms with respect to the latest timestamp of all points in the cluster. We are publishing the source code at https://github.com/UniBwTAS/continuous_clustering.

Low Latency Instance Segmentation by Continuous Clustering for LiDAR Sensors

TL;DR

This work tackles the challenge of low-latency LiDAR instance segmentation by introducing continuous clustering, which processes a LiDAR range image as an infinitely expanding stream and outputs complete object instances online during rotation. The method employs a two-layer data structure (point trees at the bottom and a high-level graph on top) to incrementally connect nearby points within a threshold , and a cluster-generation step that performs connected-component labeling as new columns arrive, publishing clusters as soon as . Key contributions include the real-time, column-by-column processing pipeline, the ability to handle multi-axis and solid-state LiDARs, and two real-time heuristics to boost throughput without sacrificing core correctness. Empirical results on SemanticKitti show an average latency of (with ) on a 128-laser Velodyne in urban scenes, demonstrating substantial latency reduction while maintaining competitive segmentation performance; the approach also enables a larger field of view for challenging occluded scenarios, with potential benefits for downstream tasks such as tracking and SLAM.

Abstract

Low-latency instance segmentation of LiDAR point clouds is crucial in real-world applications because it serves as an initial and frequently-used building block in a robot's perception pipeline, where every task adds further delay. Particularly in dynamic environments, this total delay can result in significant positional offsets of dynamic objects, as seen in highway scenarios. To address this issue, we employ a new technique, which we call continuous clustering. Unlike most existing clustering approaches, which use a full revolution of the LiDAR sensor, we process the data stream in a continuous and seamless fashion. Our approach does not rely on the concept of complete or partial sensor rotations with multiple discrete range images; instead, it views the range image as a single and infinitely horizontally growing entity. Each new column of this continuous range image is processed as soon it is available. Obstacle points are clustered to existing instances in real-time and it is checked at a high-frequency which instances are completed in order to publish them without waiting for the completion of the revolution or some other integration period. In the case of rotating sensors, no problematic discontinuities between the points of the end and the start of a scan are observed. In this work we describe the two-layered data structure and the corresponding algorithm for continuous clustering. It is able to achieve an average latency of just 5 ms with respect to the latest timestamp of all points in the cluster. We are publishing the source code at https://github.com/UniBwTAS/continuous_clustering.
Paper Structure (15 sections, 7 figures, 1 table)

This paper contains 15 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Continuous clustering: (a) An infinite range image is constructed from a continuous stream of firings (a single shot from all laser diodes). Each point within a firing has a specific constant azimuth offset in order to reduce interference between laser beams. (b) Once a column in the range image is complete (rearmost laser of firing has passed the column), the points are classified as ground (green) and obstacle points (red). (c) The obstacle points are clustered if their mutual distance is below a threshold $d_\text{T}$ and published as soon as no more points can be added (rearmost laser is distant enough).
  • Figure 2: Continuous Clustering Pipeline
  • Figure 3: Ground Point Classification: In the first pass (a) quite certain ground points are labeled based on the ego vehicle's bounding box, the sensor's height over ground and the slope between consecutive points in a column. In the second pass (b) remaining points are as labeled as ground, if their absolute z distance to the estimated terrain (gray) is below a threshold. The terrain estimation forkel_2021_probabilistic gets the certain ground points (a) of the previous time steps as input.
  • Figure 4: Clustering of a X-shaped object: Each incoming column of the range image is processed from top to bottom. The required size of the field of view (FOV, blue area) for the current point $p$ (black circle) is calculated such that it includes all neighbor points $p'$, which potentially could have a distance smaller than the distance threshold: $\lVert \overrightarrow{pp'} \rVert < d_\text{T}$. More specifically, all cells have to be evaluated whose angular difference to the current cell is smaller than $\arcsin(d_\text{T} / r_p)$, where $r_p$ is the current point's radius w.r.t. sensor origin. The cells inside the FOV are evaluated as indicated by the arrows (from dark to bright). If no neighbor is found, the current point becomes the root of a new point tree (black star). Otherwise, it becomes the child (black arrow) of the neighbor found first. For the point in the second image, the FOV includes neighbors from different trees with distance below $d_\text{T}$. This means, that the point trees belong to the same cluster and therefore a mutual higher-level link (orange) is generated. This results in a high-level graph, where the vertices are point trees.
  • Figure 5: Heuristic A: Instead of evaluating the full field of view (blue), only the yellow part with configurable size is completely evaluated. The outer blue part is only evaluated subsequently as long no neighbor was found. The yellow area is important to generate a set of high-level edges $E_\text{approx}$, which is ideally as complete as $E$ in the exact solution.
  • ...and 2 more figures