Label-efficient Semantic Scene Completion with Scribble Annotations
Song Wang, Jiawei Yu, Wentong Li, Hao Shi, Kailun Yang, Junbo Chen, Jianke Zhu
TL;DR
This work tackles label-efficient semantic scene completion by introducing ScribbleSC, a benchmark that combines sparse scribble-based semantic annotations with dense geometric occupancy. The proposed Scribble2Scene framework employs two geometry-aware auto-labelers (Dean-Labeler and Teacher-Labeler) in Stage-I to generate high-quality pseudo labels from complete geometry, followed by Stage-II online training with range-guided offline-to-online distillation (RGO$^2$D) from a fixed Teacher-Labeler to a student model using only observed input. Across SemanticKITTI and SemanticPOSS, Scribble2Scene achieves near fully-supervised performance while using only 13.5% labeled voxels, with a reported 99% mIoU of fully supervised VoxFormer at full range on SemanticKITTI. The approach demonstrates strong generalization, robust performance gains over scribble-only baselines, and provides a practical, label-efficient path for 3D semantic occupancy estimation in autonomous driving.
Abstract
Semantic scene completion aims to infer the 3D geometric structures with semantic classes from camera or LiDAR, which provide essential occupancy information in autonomous driving. Prior endeavors concentrate on constructing the network or benchmark in a fully supervised manner. While the dense occupancy grids need point-wise semantic annotations, which incur expensive and tedious labeling costs. In this paper, we build a new label-efficient benchmark, named ScribbleSC, where the sparse scribble-based semantic labels are combined with dense geometric labels for semantic scene completion. In particular, we propose a simple yet effective approach called Scribble2Scene, which bridges the gap between the sparse scribble annotations and fully-supervision. Our method consists of geometric-aware auto-labelers construction and online model training with an offline-to-online distillation module to enhance the performance. Experiments on SemanticKITTI demonstrate that Scribble2Scene achieves competitive performance against the fully-supervised counterparts, showing 99% performance of the fully-supervised models with only 13.5% voxels labeled. Both annotations of ScribbleSC and our full implementation are available at https://github.com/songw-zju/Scribble2Scene.
