DUDA: Distilled Unsupervised Domain Adaptation for Lightweight Semantic Segmentation
Beomseok Kang, Niluthpol Chowdhury Mithun, Abhinav Rajvanshi, Han-Pang Chiu, Supun Samarasekera
TL;DR
DUDA addresses the challenge of achieving high-accuracy unsupervised domain adaptation for lightweight semantic segmentation by fusing EMA-based self-training with knowledge distillation through a three-network setup: a large teacher, a large student, and a small student. The framework introduces pre-adaptation with gradual distillation, an inconsistency-based loss weighting to emphasize poorly adapted classes, and multi-teacher learning to improve pseudo-label quality, enabling lightweight models to match or exceed heavyweight baselines across four benchmarks. Empirical results show substantial gains for small backbones, reduced memory and FLOPs, and a reduced gap between Transformer-based and CNN-based architectures, with notable improvements in minority classes. The approach is model-agnostic and can operate in heterogeneous teacher–student configurations, offering practical benefits for edge devices and resource-constrained deployments in semantic segmentation tasks.
Abstract
Unsupervised Domain Adaptation (UDA) is essential for enabling semantic segmentation in new domains without requiring costly pixel-wise annotations. State-of-the-art (SOTA) UDA methods primarily use self-training with architecturally identical teacher and student networks, relying on Exponential Moving Average (EMA) updates. However, these approaches face substantial performance degradation with lightweight models due to inherent architectural inflexibility leading to low-quality pseudo-labels. To address this, we propose Distilled Unsupervised Domain Adaptation (DUDA), a novel framework that combines EMA-based self-training with knowledge distillation (KD). Our method employs an auxiliary student network to bridge the architectural gap between heavyweight and lightweight models for EMA-based updates, resulting in improved pseudo-label quality. DUDA employs a strategic fusion of UDA and KD, incorporating innovative elements such as gradual distillation from large to small networks, inconsistency loss prioritizing poorly adapted classes, and learning with multiple teachers. Extensive experiments across four UDA benchmarks demonstrate DUDA's superiority in achieving SOTA performance with lightweight models, often surpassing the performance of heavyweight models from other approaches.
