Label-efficient Semantic Scene Completion with Scribble Annotations

Song Wang; Jiawei Yu; Wentong Li; Hao Shi; Kailun Yang; Junbo Chen; Jianke Zhu

Label-efficient Semantic Scene Completion with Scribble Annotations

Song Wang, Jiawei Yu, Wentong Li, Hao Shi, Kailun Yang, Junbo Chen, Jianke Zhu

TL;DR

This work tackles label-efficient semantic scene completion by introducing ScribbleSC, a benchmark that combines sparse scribble-based semantic annotations with dense geometric occupancy. The proposed Scribble2Scene framework employs two geometry-aware auto-labelers (Dean-Labeler and Teacher-Labeler) in Stage-I to generate high-quality pseudo labels from complete geometry, followed by Stage-II online training with range-guided offline-to-online distillation (RGO$^2$D) from a fixed Teacher-Labeler to a student model using only observed input. Across SemanticKITTI and SemanticPOSS, Scribble2Scene achieves near fully-supervised performance while using only 13.5% labeled voxels, with a reported 99% mIoU of fully supervised VoxFormer at full range on SemanticKITTI. The approach demonstrates strong generalization, robust performance gains over scribble-only baselines, and provides a practical, label-efficient path for 3D semantic occupancy estimation in autonomous driving.

Abstract

Semantic scene completion aims to infer the 3D geometric structures with semantic classes from camera or LiDAR, which provide essential occupancy information in autonomous driving. Prior endeavors concentrate on constructing the network or benchmark in a fully supervised manner. While the dense occupancy grids need point-wise semantic annotations, which incur expensive and tedious labeling costs. In this paper, we build a new label-efficient benchmark, named ScribbleSC, where the sparse scribble-based semantic labels are combined with dense geometric labels for semantic scene completion. In particular, we propose a simple yet effective approach called Scribble2Scene, which bridges the gap between the sparse scribble annotations and fully-supervision. Our method consists of geometric-aware auto-labelers construction and online model training with an offline-to-online distillation module to enhance the performance. Experiments on SemanticKITTI demonstrate that Scribble2Scene achieves competitive performance against the fully-supervised counterparts, showing 99% performance of the fully-supervised models with only 13.5% voxels labeled. Both annotations of ScribbleSC and our full implementation are available at https://github.com/songw-zju/Scribble2Scene.

Label-efficient Semantic Scene Completion with Scribble Annotations

TL;DR

D) from a fixed Teacher-Labeler to a student model using only observed input. Across SemanticKITTI and SemanticPOSS, Scribble2Scene achieves near fully-supervised performance while using only 13.5% labeled voxels, with a reported 99% mIoU of fully supervised VoxFormer at full range on SemanticKITTI. The approach demonstrates strong generalization, robust performance gains over scribble-only baselines, and provides a practical, label-efficient path for 3D semantic occupancy estimation in autonomous driving.

Abstract

Paper Structure (21 sections, 5 equations, 8 figures, 12 tables)

This paper contains 21 sections, 5 equations, 8 figures, 12 tables.

Introduction
Related Work
The ScribbleSC Benchmark
Proposed Method
Overview of Scribble2Scene
Geometry-Aware Auto-Labelers
Online Model Training with Distillation
Training and Inference
Experiments
Experimental Setup
Main Results
Ablation Studies
Conclusion
More Implementation Details
More Details on ScribbleSC
...and 6 more sections

Figures (8)

Figure 1: Comparisons between fully supervised methods and our proposed weakly scribble-supervised Scribble2Scene approach for semantic scene completion on SemanticKITTI. The top half shows examples of semantic occupancy predictions. The bottom half indicates that our presented scribble-supervised approach achieves 99% performance (mIoU) of the fully-supervised methods, which significantly improves the baseline model.
Figure 2: Examples of the fully-annotated ground truth from SemanticKITTI (left) and scribble-annotated supervision from our constructed ScribbleSC (right).
Figure 3: Quantity on each category of voxels labeled within ScribbleSC (deep color) in comparison to the fully-annotated SemanticKITTI dataset (light color). The total number of labeled voxels in ScribbleSC is only 13.5% over SemanticKITTI.
Figure 4: Overview of Scribble2Scene. The left half illustrates the offline geometry-aware auto-labelers construction at Stage-I. The right half shows the online model training with distillation at Stage-II. The accurate pseudo labels from Dean-Labeler and the well-trained Teacher-Labeler are fully leveraged for online model optimization.
Figure 5: Illustration of range-guided offline-to-online distillation scheme. The red dot denotes the location of the ego-vehicle. The global and local distillation with different ranges are performed, respectively.
...and 3 more figures

Label-efficient Semantic Scene Completion with Scribble Annotations

TL;DR

Abstract

Label-efficient Semantic Scene Completion with Scribble Annotations

Authors

TL;DR

Abstract

Table of Contents

Figures (8)