Table of Contents
Fetching ...

Is clustering enough for LiDAR instance segmentation? A state-of-the-art training-free baseline

Corentin Sautier, Gilles Puy, Alexandre Boulch, Renaud Marlet, Vincent Lepetit

TL;DR

This work tackles LiDAR panoptic segmentation by showing that a training-free baseline can rival supervised methods. It introduces ALPINE, a per-class BEV clustering pipeline that uses a fast kNN graph and connected components to extract instances from semantic predictions, with class-wise thresholds and a box-splitting refinement. Empirical results across SemanticKITTI, nuScenes, and SemanticPOSS demonstrate strong PQ performance, often matching or exceeding state-of-the-art methods without any instance labels or training, and with real-time CPU runtime. The study highlights that current instance heads may be saturated and positions ALPINE as a robust, explainable baseline that can be paired with any semantic backbone. It also provides extensive ablations, upper-bound analyses with oracles, and practical parameter settings, suggesting broad applicability and a clear benchmark for future end-to-end panoptic methods.

Abstract

Panoptic segmentation of LiDAR point clouds is fundamental to outdoor scene understanding, with autonomous driving being a primary application. While state-of-the-art approaches typically rely on end-to-end deep learning architectures and extensive manual annotations of instances, the significant cost and time investment required for labeling large-scale point cloud datasets remains a major bottleneck in this field. In this work, we demonstrate that competitive panoptic segmentation can be achieved using only semantic labels, with instances predicted without any training or annotations. Our method outperforms {most} state-of-the-art supervised methods on standard benchmarks including SemanticKITTI and nuScenes, and outperforms every publicly available method on SemanticKITTI as a drop-in instance head replacement, while running in real-time on a single-threaded CPU and requiring no instance labels. It is fully explainable, and requires no learning or parameter tuning. Alpine combined with state-of-the-art semantic segmentation ranks first on the official panoptic segmentation leaderboard of SemanticKITTI. Code is available at https://github.com/valeoai/Alpine/

Is clustering enough for LiDAR instance segmentation? A state-of-the-art training-free baseline

TL;DR

This work tackles LiDAR panoptic segmentation by showing that a training-free baseline can rival supervised methods. It introduces ALPINE, a per-class BEV clustering pipeline that uses a fast kNN graph and connected components to extract instances from semantic predictions, with class-wise thresholds and a box-splitting refinement. Empirical results across SemanticKITTI, nuScenes, and SemanticPOSS demonstrate strong PQ performance, often matching or exceeding state-of-the-art methods without any instance labels or training, and with real-time CPU runtime. The study highlights that current instance heads may be saturated and positions ALPINE as a robust, explainable baseline that can be paired with any semantic backbone. It also provides extensive ablations, upper-bound analyses with oracles, and practical parameter settings, suggesting broad applicability and a clear benchmark for future end-to-end panoptic methods.

Abstract

Panoptic segmentation of LiDAR point clouds is fundamental to outdoor scene understanding, with autonomous driving being a primary application. While state-of-the-art approaches typically rely on end-to-end deep learning architectures and extensive manual annotations of instances, the significant cost and time investment required for labeling large-scale point cloud datasets remains a major bottleneck in this field. In this work, we demonstrate that competitive panoptic segmentation can be achieved using only semantic labels, with instances predicted without any training or annotations. Our method outperforms {most} state-of-the-art supervised methods on standard benchmarks including SemanticKITTI and nuScenes, and outperforms every publicly available method on SemanticKITTI as a drop-in instance head replacement, while running in real-time on a single-threaded CPU and requiring no instance labels. It is fully explainable, and requires no learning or parameter tuning. Alpine combined with state-of-the-art semantic segmentation ranks first on the official panoptic segmentation leaderboard of SemanticKITTI. Code is available at https://github.com/valeoai/Alpine/

Paper Structure

This paper contains 49 sections, 5 equations, 8 figures, 17 tables, 2 algorithms.

Figures (8)

  • Figure 1: ALPINE clustering. For a given semantic class a) we project the points in the BEV space (subsampled on the figure for visualization purpose), b) we build a kNN graph and filter by edge length and c) we extract the connected components.
  • Figure 2: Examples of Instance predictions on SemanticKITTI. We present the results obtained with D&M dividemerge and ALPINE restricted to the car class. Both methods are training-free clusterings, and use the same MinkUNet to obtain pointwise semantic predictions. When compared to the Ground Truth, we notice that D&M does not satisfactorily separate the cars while ALPINE segment them correctly.
  • Figure 3: Overview. In ALPINE we take the output of a semantic segmentation model and apply our clustering algorithm on each things classes to obtain instance masks and form panoptic predictions.
  • Figure 4: Example of bounding box splittings. In the top examples, two cars are parked close to each other. While merged by the clustering, they do not fit in a car's box and the cluster is then split. In the bottom examples, the same mechanism is applied to two pedestrians in a bus shelter. Boxes are shown in 3D for illustrative purposes but the mechanism is purely 2-dimensional.
  • Figure 5: Visual description of the box fitting algorithm
  • ...and 3 more figures