MILAN: Milli-Annotations for Lidar Semantic Segmentation
Nermin Samet, Gilles Puy, Oriane Siméoni, Renaud Marlet
TL;DR
MILAN tackles the high cost of annotating lidar point clouds for semantic segmentation by combining self-supervised 3D features with a three-stage pipeline: diverse frame selection, cluster-based cheap annotation of selected frames, and semi-supervised training using pseudo-labels and a teacher-student EMA to leverage unlabeled data. It leverages ScaLR-based 3D features for frame diversity and point clustering, enabling annotators to label cluster centers with a single action and propagate labels across clusters. The approach achieves near fully-supervised performance on SemanticKITTI and nuScenes with annotations orders of magnitude lower than full labeling (as low as one-tenth of a percent in some cases) and demonstrates competitive or superior results against state-of-the-art annotation-cost reduction methods. The work emphasizes simplicity, scalability, and practical impact for deploying high-quality lidar segmentation with massively reduced labeling effort.
Abstract
Annotating lidar point clouds for autonomous driving is a notoriously expensive and time-consuming task. In this work, we show that the quality of recent self-supervised lidar scan representations allows a great reduction of the annotation cost. Our method has two main steps. First, we show that self-supervised representations allow a simple and direct selection of highly informative lidar scans to annotate: training a network on these selected scans leads to much better results than a random selection of scans and, more interestingly, to results on par with selections made by SOTA active learning methods. In a second step, we leverage the same self-supervised representations to cluster points in our selected scans. Asking the annotator to classify each cluster, with a single click per cluster, then permits us to close the gap with fully-annotated training sets, while only requiring one thousandth of the point labels.
