Table of Contents
Fetching ...

DUDA: Distilled Unsupervised Domain Adaptation for Lightweight Semantic Segmentation

Beomseok Kang, Niluthpol Chowdhury Mithun, Abhinav Rajvanshi, Han-Pang Chiu, Supun Samarasekera

TL;DR

DUDA addresses the challenge of achieving high-accuracy unsupervised domain adaptation for lightweight semantic segmentation by fusing EMA-based self-training with knowledge distillation through a three-network setup: a large teacher, a large student, and a small student. The framework introduces pre-adaptation with gradual distillation, an inconsistency-based loss weighting to emphasize poorly adapted classes, and multi-teacher learning to improve pseudo-label quality, enabling lightweight models to match or exceed heavyweight baselines across four benchmarks. Empirical results show substantial gains for small backbones, reduced memory and FLOPs, and a reduced gap between Transformer-based and CNN-based architectures, with notable improvements in minority classes. The approach is model-agnostic and can operate in heterogeneous teacher–student configurations, offering practical benefits for edge devices and resource-constrained deployments in semantic segmentation tasks.

Abstract

Unsupervised Domain Adaptation (UDA) is essential for enabling semantic segmentation in new domains without requiring costly pixel-wise annotations. State-of-the-art (SOTA) UDA methods primarily use self-training with architecturally identical teacher and student networks, relying on Exponential Moving Average (EMA) updates. However, these approaches face substantial performance degradation with lightweight models due to inherent architectural inflexibility leading to low-quality pseudo-labels. To address this, we propose Distilled Unsupervised Domain Adaptation (DUDA), a novel framework that combines EMA-based self-training with knowledge distillation (KD). Our method employs an auxiliary student network to bridge the architectural gap between heavyweight and lightweight models for EMA-based updates, resulting in improved pseudo-label quality. DUDA employs a strategic fusion of UDA and KD, incorporating innovative elements such as gradual distillation from large to small networks, inconsistency loss prioritizing poorly adapted classes, and learning with multiple teachers. Extensive experiments across four UDA benchmarks demonstrate DUDA's superiority in achieving SOTA performance with lightweight models, often surpassing the performance of heavyweight models from other approaches.

DUDA: Distilled Unsupervised Domain Adaptation for Lightweight Semantic Segmentation

TL;DR

DUDA addresses the challenge of achieving high-accuracy unsupervised domain adaptation for lightweight semantic segmentation by fusing EMA-based self-training with knowledge distillation through a three-network setup: a large teacher, a large student, and a small student. The framework introduces pre-adaptation with gradual distillation, an inconsistency-based loss weighting to emphasize poorly adapted classes, and multi-teacher learning to improve pseudo-label quality, enabling lightweight models to match or exceed heavyweight baselines across four benchmarks. Empirical results show substantial gains for small backbones, reduced memory and FLOPs, and a reduced gap between Transformer-based and CNN-based architectures, with notable improvements in minority classes. The approach is model-agnostic and can operate in heterogeneous teacher–student configurations, offering practical benefits for edge devices and resource-constrained deployments in semantic segmentation tasks.

Abstract

Unsupervised Domain Adaptation (UDA) is essential for enabling semantic segmentation in new domains without requiring costly pixel-wise annotations. State-of-the-art (SOTA) UDA methods primarily use self-training with architecturally identical teacher and student networks, relying on Exponential Moving Average (EMA) updates. However, these approaches face substantial performance degradation with lightweight models due to inherent architectural inflexibility leading to low-quality pseudo-labels. To address this, we propose Distilled Unsupervised Domain Adaptation (DUDA), a novel framework that combines EMA-based self-training with knowledge distillation (KD). Our method employs an auxiliary student network to bridge the architectural gap between heavyweight and lightweight models for EMA-based updates, resulting in improved pseudo-label quality. DUDA employs a strategic fusion of UDA and KD, incorporating innovative elements such as gradual distillation from large to small networks, inconsistency loss prioritizing poorly adapted classes, and learning with multiple teachers. Extensive experiments across four UDA benchmarks demonstrate DUDA's superiority in achieving SOTA performance with lightweight models, often surpassing the performance of heavyweight models from other approaches.

Paper Structure

This paper contains 21 sections, 11 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Semantic segmentation accuracy. (a) and (b) compare DUDA with DAFormer hoyer2022daformer and MIC hoyer2023mic, respectively. Various backbones are employed, such as MiT-B0, B1, B2, B4, and B5, from left to right data points. (c) shows a comparison of DUDA with recent UDA methods using DeepLab-V2 with a ResNet-101 backbone. Performance is measured on synthetic-to-real (GTA$\rightarrow$Cityscapes) adaptation.
  • Figure 2: A brief illustration of the proposed Distilled Unsupervised Domain Adaptation (DUDA) approach.
  • Figure 3: Qualitative results in benchmarks: GTA$\rightarrow$CS (1st row); SYN$\rightarrow$CS (2nd row); CS$\rightarrow$DZur (3rd row); CS$\rightarrow$ACDC (4th row).
  • Figure 4: Class-wise inconsistency and accuracy changes resulted from inconsistency-based loss balancing on GTA$\rightarrow$CS. (a) compares the class-wise true IoU disparity and the normalized inconsistency ($I'_{c}$) between MiT-B5 and MiT-B0 after the pre-adaptation by DUDA$_\text{DAF}$. (b) shows the accuracy difference between the MiT-B0 models after fine-tuning with and without the loss balancing.