Table of Contents
Fetching ...

Knowledge Distillation for Semantic Segmentation: A Label Space Unification Approach

Anton Backhaus, Thorsten Luettel, Mirko Maehlisch

TL;DR

This work tackles the challenge of divergent taxonomies across semantic segmentation datasets by introducing a label-space unification framework built on knowledge distillation. A teacher model trained on a source taxonomy generates ontology-constrained pseudo-labels for related datasets, enabling training of a student on a unified dataset that outperforms the teacher in urban and off-road driving tasks. The approach yields large composite datasets and demonstrates that larger models and domain-aware priors significantly boost performance, with robust gains on generalization benchmarks like WildDash, though benefits on some source datasets may vary. Overall, the method provides a simple, architecture-agnostic mechanism to leverage heterogeneous autonomous driving data without re-labeling or extensive model redesign, advancing practical data efficiency and generalization in semantic segmentation.

Abstract

An increasing number of datasets sharing similar domains for semantic segmentation have been published over the past few years. But despite the growing amount of overall data, it is still difficult to train bigger and better models due to inconsistency in taxonomy and/or labeling policies of different datasets. To this end, we propose a knowledge distillation approach that also serves as a label space unification method for semantic segmentation. In short, a teacher model is trained on a source dataset with a given taxonomy, then used to pseudo-label additional data for which ground truth labels of a related label space exist. By mapping the related taxonomies to the source taxonomy, we create constraints within which the model can predict pseudo-labels. Using the improved pseudo-labels we train student models that consistently outperform their teachers in two challenging domains, namely urban and off-road driving. Our ground truth-corrected pseudo-labels span over 12 and 7 public datasets with 388.230 and 18.558 images for the urban and off-road domains, respectively, creating the largest compound datasets for autonomous driving to date.

Knowledge Distillation for Semantic Segmentation: A Label Space Unification Approach

TL;DR

This work tackles the challenge of divergent taxonomies across semantic segmentation datasets by introducing a label-space unification framework built on knowledge distillation. A teacher model trained on a source taxonomy generates ontology-constrained pseudo-labels for related datasets, enabling training of a student on a unified dataset that outperforms the teacher in urban and off-road driving tasks. The approach yields large composite datasets and demonstrates that larger models and domain-aware priors significantly boost performance, with robust gains on generalization benchmarks like WildDash, though benefits on some source datasets may vary. Overall, the method provides a simple, architecture-agnostic mechanism to leverage heterogeneous autonomous driving data without re-labeling or extensive model redesign, advancing practical data efficiency and generalization in semantic segmentation.

Abstract

An increasing number of datasets sharing similar domains for semantic segmentation have been published over the past few years. But despite the growing amount of overall data, it is still difficult to train bigger and better models due to inconsistency in taxonomy and/or labeling policies of different datasets. To this end, we propose a knowledge distillation approach that also serves as a label space unification method for semantic segmentation. In short, a teacher model is trained on a source dataset with a given taxonomy, then used to pseudo-label additional data for which ground truth labels of a related label space exist. By mapping the related taxonomies to the source taxonomy, we create constraints within which the model can predict pseudo-labels. Using the improved pseudo-labels we train student models that consistently outperform their teachers in two challenging domains, namely urban and off-road driving. Our ground truth-corrected pseudo-labels span over 12 and 7 public datasets with 388.230 and 18.558 images for the urban and off-road domains, respectively, creating the largest compound datasets for autonomous driving to date.

Paper Structure

This paper contains 10 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Pseudo-label creation for an ApolloScape bib:apolloscape image: Using the ground truth (b), taxonomy mapping, and teacher model prediction (c), a pseudo-ground truth (d) is generated in the source label space. Systematic errors caused by the LiDAR-based semi-automated labeling like the missing labels for sky, building, and ego-vehicle (circled) are also fixed.
  • Figure 2: Pseudo-label generation process: An RGB image is fed into a teacher network. The resulting softmax output is first refined using test-time augmentation (TTA), then using the ground truth. Here, pixel $p_{ij}$ is cityscapes:road. Additionally, using the label mapping, the output at $p_{ij}$ is constrained to GOOSE:asphalt (), marking () or cobble (). The final hard pseudo-label is depicted on the right.
  • Figure 3: Pseudo-label creation on three datasets using the GOOSE source taxonomy which includes classes such as sidewalk (), cobblestone (), non-drivable vegetation (), low grass (), high grass (), asphalt (), and rough drivable surface ().