Table of Contents
Fetching ...

MILAN: Milli-Annotations for Lidar Semantic Segmentation

Nermin Samet, Gilles Puy, Oriane Siméoni, Renaud Marlet

TL;DR

MILAN tackles the high cost of annotating lidar point clouds for semantic segmentation by combining self-supervised 3D features with a three-stage pipeline: diverse frame selection, cluster-based cheap annotation of selected frames, and semi-supervised training using pseudo-labels and a teacher-student EMA to leverage unlabeled data. It leverages ScaLR-based 3D features for frame diversity and point clustering, enabling annotators to label cluster centers with a single action and propagate labels across clusters. The approach achieves near fully-supervised performance on SemanticKITTI and nuScenes with annotations orders of magnitude lower than full labeling (as low as one-tenth of a percent in some cases) and demonstrates competitive or superior results against state-of-the-art annotation-cost reduction methods. The work emphasizes simplicity, scalability, and practical impact for deploying high-quality lidar segmentation with massively reduced labeling effort.

Abstract

Annotating lidar point clouds for autonomous driving is a notoriously expensive and time-consuming task. In this work, we show that the quality of recent self-supervised lidar scan representations allows a great reduction of the annotation cost. Our method has two main steps. First, we show that self-supervised representations allow a simple and direct selection of highly informative lidar scans to annotate: training a network on these selected scans leads to much better results than a random selection of scans and, more interestingly, to results on par with selections made by SOTA active learning methods. In a second step, we leverage the same self-supervised representations to cluster points in our selected scans. Asking the annotator to classify each cluster, with a single click per cluster, then permits us to close the gap with fully-annotated training sets, while only requiring one thousandth of the point labels.

MILAN: Milli-Annotations for Lidar Semantic Segmentation

TL;DR

MILAN tackles the high cost of annotating lidar point clouds for semantic segmentation by combining self-supervised 3D features with a three-stage pipeline: diverse frame selection, cluster-based cheap annotation of selected frames, and semi-supervised training using pseudo-labels and a teacher-student EMA to leverage unlabeled data. It leverages ScaLR-based 3D features for frame diversity and point clustering, enabling annotators to label cluster centers with a single action and propagate labels across clusters. The approach achieves near fully-supervised performance on SemanticKITTI and nuScenes with annotations orders of magnitude lower than full labeling (as low as one-tenth of a percent in some cases) and demonstrates competitive or superior results against state-of-the-art annotation-cost reduction methods. The work emphasizes simplicity, scalability, and practical impact for deploying high-quality lidar segmentation with massively reduced labeling effort.

Abstract

Annotating lidar point clouds for autonomous driving is a notoriously expensive and time-consuming task. In this work, we show that the quality of recent self-supervised lidar scan representations allows a great reduction of the annotation cost. Our method has two main steps. First, we show that self-supervised representations allow a simple and direct selection of highly informative lidar scans to annotate: training a network on these selected scans leads to much better results than a random selection of scans and, more interestingly, to results on par with selections made by SOTA active learning methods. In a second step, we leverage the same self-supervised representations to cluster points in our selected scans. Asking the annotator to classify each cluster, with a single click per cluster, then permits us to close the gap with fully-annotated training sets, while only requiring one thousandth of the point labels.
Paper Structure (42 sections, 2 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 42 sections, 2 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: Performance (mIoU%) obtained with MILAN using WaffleIron puy23waffleiron on SemanticKITTI and nuScenes for different (very) low levels of annotation. We compare our results to those of a model trained in a fully-supervised fashion with $100\%$ of the labels. We observe that with just $0.05\%$ of annotated points on SemanticKITTI, we obtain results on-par with those of the fully-supervised model. On the more challenging nuScenes dataset, we reach results similar to the fully supervised baseline with as little as 0.9% of annotated points.
  • Figure 2: MILAN's annotation pipeline. The sequences of scans are first pruned to remove highly similar consecutive frames. Then a subset of scans with high diversity are selected using our scalable SeedAL method. For each selected scan, the points are clustered using self-supervised features. One point in each cluster is presented to an annotator, to prove a semantic label. The resulting sparse annotations are then propagated to the whole scan by giving the same label to all points falling in the same cluster. This scarce-annotation pipeline is followed by self-training.
  • Figure 3: Comparison of our direct (non iterative) data selection strategy MILAN with SOTA active learning strategies on SemanticKITTI. All models are trained using SPVCNN as network architecture and using solely the scans with labeled points for training (no semi-supervision).
  • Figure 4: Quality of our propagated labels. Left: Our propagated labels after annotating 1% and 2% of the points per scan on SemanticKITTI (first two rows) and nuScenes (last two rows), respectively. Right: Ground truth labels.