Table of Contents
Fetching ...

Label-efficient Semantic Scene Completion with Scribble Annotations

Song Wang, Jiawei Yu, Wentong Li, Hao Shi, Kailun Yang, Junbo Chen, Jianke Zhu

TL;DR

This work tackles label-efficient semantic scene completion by introducing ScribbleSC, a benchmark that combines sparse scribble-based semantic annotations with dense geometric occupancy. The proposed Scribble2Scene framework employs two geometry-aware auto-labelers (Dean-Labeler and Teacher-Labeler) in Stage-I to generate high-quality pseudo labels from complete geometry, followed by Stage-II online training with range-guided offline-to-online distillation (RGO$^2$D) from a fixed Teacher-Labeler to a student model using only observed input. Across SemanticKITTI and SemanticPOSS, Scribble2Scene achieves near fully-supervised performance while using only 13.5% labeled voxels, with a reported 99% mIoU of fully supervised VoxFormer at full range on SemanticKITTI. The approach demonstrates strong generalization, robust performance gains over scribble-only baselines, and provides a practical, label-efficient path for 3D semantic occupancy estimation in autonomous driving.

Abstract

Semantic scene completion aims to infer the 3D geometric structures with semantic classes from camera or LiDAR, which provide essential occupancy information in autonomous driving. Prior endeavors concentrate on constructing the network or benchmark in a fully supervised manner. While the dense occupancy grids need point-wise semantic annotations, which incur expensive and tedious labeling costs. In this paper, we build a new label-efficient benchmark, named ScribbleSC, where the sparse scribble-based semantic labels are combined with dense geometric labels for semantic scene completion. In particular, we propose a simple yet effective approach called Scribble2Scene, which bridges the gap between the sparse scribble annotations and fully-supervision. Our method consists of geometric-aware auto-labelers construction and online model training with an offline-to-online distillation module to enhance the performance. Experiments on SemanticKITTI demonstrate that Scribble2Scene achieves competitive performance against the fully-supervised counterparts, showing 99% performance of the fully-supervised models with only 13.5% voxels labeled. Both annotations of ScribbleSC and our full implementation are available at https://github.com/songw-zju/Scribble2Scene.

Label-efficient Semantic Scene Completion with Scribble Annotations

TL;DR

This work tackles label-efficient semantic scene completion by introducing ScribbleSC, a benchmark that combines sparse scribble-based semantic annotations with dense geometric occupancy. The proposed Scribble2Scene framework employs two geometry-aware auto-labelers (Dean-Labeler and Teacher-Labeler) in Stage-I to generate high-quality pseudo labels from complete geometry, followed by Stage-II online training with range-guided offline-to-online distillation (RGOD) from a fixed Teacher-Labeler to a student model using only observed input. Across SemanticKITTI and SemanticPOSS, Scribble2Scene achieves near fully-supervised performance while using only 13.5% labeled voxels, with a reported 99% mIoU of fully supervised VoxFormer at full range on SemanticKITTI. The approach demonstrates strong generalization, robust performance gains over scribble-only baselines, and provides a practical, label-efficient path for 3D semantic occupancy estimation in autonomous driving.

Abstract

Semantic scene completion aims to infer the 3D geometric structures with semantic classes from camera or LiDAR, which provide essential occupancy information in autonomous driving. Prior endeavors concentrate on constructing the network or benchmark in a fully supervised manner. While the dense occupancy grids need point-wise semantic annotations, which incur expensive and tedious labeling costs. In this paper, we build a new label-efficient benchmark, named ScribbleSC, where the sparse scribble-based semantic labels are combined with dense geometric labels for semantic scene completion. In particular, we propose a simple yet effective approach called Scribble2Scene, which bridges the gap between the sparse scribble annotations and fully-supervision. Our method consists of geometric-aware auto-labelers construction and online model training with an offline-to-online distillation module to enhance the performance. Experiments on SemanticKITTI demonstrate that Scribble2Scene achieves competitive performance against the fully-supervised counterparts, showing 99% performance of the fully-supervised models with only 13.5% voxels labeled. Both annotations of ScribbleSC and our full implementation are available at https://github.com/songw-zju/Scribble2Scene.
Paper Structure (21 sections, 5 equations, 8 figures, 12 tables)

This paper contains 21 sections, 5 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Comparisons between fully supervised methods and our proposed weakly scribble-supervised Scribble2Scene approach for semantic scene completion on SemanticKITTI. The top half shows examples of semantic occupancy predictions. The bottom half indicates that our presented scribble-supervised approach achieves 99% performance (mIoU) of the fully-supervised methods, which significantly improves the baseline model.
  • Figure 2: Examples of the fully-annotated ground truth from SemanticKITTI (left) and scribble-annotated supervision from our constructed ScribbleSC (right).
  • Figure 3: Quantity on each category of voxels labeled within ScribbleSC (deep color) in comparison to the fully-annotated SemanticKITTI dataset (light color). The total number of labeled voxels in ScribbleSC is only 13.5% over SemanticKITTI.
  • Figure 4: Overview of Scribble2Scene. The left half illustrates the offline geometry-aware auto-labelers construction at Stage-I. The right half shows the online model training with distillation at Stage-II. The accurate pseudo labels from Dean-Labeler and the well-trained Teacher-Labeler are fully leveraged for online model optimization.
  • Figure 5: Illustration of range-guided offline-to-online distillation scheme. The red dot denotes the location of the ego-vehicle. The global and local distillation with different ranges are performed, respectively.
  • ...and 3 more figures