MILAN: Milli-Annotations for Lidar Semantic Segmentation

Nermin Samet; Gilles Puy; Oriane Siméoni; Renaud Marlet

MILAN: Milli-Annotations for Lidar Semantic Segmentation

Nermin Samet, Gilles Puy, Oriane Siméoni, Renaud Marlet

TL;DR

MILAN tackles the high cost of annotating lidar point clouds for semantic segmentation by combining self-supervised 3D features with a three-stage pipeline: diverse frame selection, cluster-based cheap annotation of selected frames, and semi-supervised training using pseudo-labels and a teacher-student EMA to leverage unlabeled data. It leverages ScaLR-based 3D features for frame diversity and point clustering, enabling annotators to label cluster centers with a single action and propagate labels across clusters. The approach achieves near fully-supervised performance on SemanticKITTI and nuScenes with annotations orders of magnitude lower than full labeling (as low as one-tenth of a percent in some cases) and demonstrates competitive or superior results against state-of-the-art annotation-cost reduction methods. The work emphasizes simplicity, scalability, and practical impact for deploying high-quality lidar segmentation with massively reduced labeling effort.

Abstract

Annotating lidar point clouds for autonomous driving is a notoriously expensive and time-consuming task. In this work, we show that the quality of recent self-supervised lidar scan representations allows a great reduction of the annotation cost. Our method has two main steps. First, we show that self-supervised representations allow a simple and direct selection of highly informative lidar scans to annotate: training a network on these selected scans leads to much better results than a random selection of scans and, more interestingly, to results on par with selections made by SOTA active learning methods. In a second step, we leverage the same self-supervised representations to cluster points in our selected scans. Asking the annotator to classify each cluster, with a single click per cluster, then permits us to close the gap with fully-annotated training sets, while only requiring one thousandth of the point labels.

MILAN: Milli-Annotations for Lidar Semantic Segmentation

TL;DR

Abstract

Paper Structure (42 sections, 2 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 42 sections, 2 equations, 4 figures, 8 tables, 1 algorithm.

Introduction
Related work
Few-shot finetuning.
Active learning (AL).
Annotation granularity.
Annotator task.
Data selection.
Semi-supervised learning.
Method
Frame selection
Manual annotation and pseudo-labeling
Semi-supervision
Comparison to SOTA methods
Experiments
Technical details
...and 27 more sections

Figures (4)

Figure 1: Performance (mIoU%) obtained with MILAN using WaffleIron puy23waffleiron on SemanticKITTI and nuScenes for different (very) low levels of annotation. We compare our results to those of a model trained in a fully-supervised fashion with $100\%$ of the labels. We observe that with just $0.05\%$ of annotated points on SemanticKITTI, we obtain results on-par with those of the fully-supervised model. On the more challenging nuScenes dataset, we reach results similar to the fully supervised baseline with as little as 0.9% of annotated points.
Figure 2: MILAN's annotation pipeline. The sequences of scans are first pruned to remove highly similar consecutive frames. Then a subset of scans with high diversity are selected using our scalable SeedAL method. For each selected scan, the points are clustered using self-supervised features. One point in each cluster is presented to an annotator, to prove a semantic label. The resulting sparse annotations are then propagated to the whole scan by giving the same label to all points falling in the same cluster. This scarce-annotation pipeline is followed by self-training.
Figure 3: Comparison of our direct (non iterative) data selection strategy MILAN with SOTA active learning strategies on SemanticKITTI. All models are trained using SPVCNN as network architecture and using solely the scans with labeled points for training (no semi-supervision).
Figure 4: Quality of our propagated labels. Left: Our propagated labels after annotating 1% and 2% of the points per scan on SemanticKITTI (first two rows) and nuScenes (last two rows), respectively. Right: Ground truth labels.

MILAN: Milli-Annotations for Lidar Semantic Segmentation

TL;DR

Abstract

MILAN: Milli-Annotations for Lidar Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)