Table of Contents
Fetching ...

ControlUDA: Controllable Diffusion-assisted Unsupervised Domain Adaptation for Cross-Weather Semantic Segmentation

Fengyi Shen, Li Zhou, Kagan Kucukaytekin, Ziyuan Liu, He Wang, Alois Knoll

TL;DR

ControlUDA tackles unsupervised domain adaptation for semantic segmentation under adverse weather by tuning a diffusion model with target priors derived from a pretrained segmentor, enabling controllable pseudo-target data generation conditioned on source labels. The framework introduces UDAControlNet for multi-scale conditioning and structure-aware prompt-enhanced diffusion, forming a closed loop from source labels to target-like images to improved segmentation. Empirical results on Cityscapes→ACDC show substantial gains over prior SOTA, including a peak of $72.0$ mIoU, and ablation and generalization analyses demonstrate robustness across unseen weather and cross-dataset settings. This approach offers a practical, scalable pathway to leverage diffusion models for targeted cross-weather UDA with strong transferability.

Abstract

Data generation is recognized as a potent strategy for unsupervised domain adaptation (UDA) pertaining semantic segmentation in adverse weathers. Nevertheless, these adverse weather scenarios encompass multiple possibilities, and high-fidelity data synthesis with controllable weather is under-researched in previous UDA works. The recent strides in large-scale text-to-image diffusion models (DM) have ushered in a novel avenue for research, enabling the generation of realistic images conditioned on semantic labels. This capability proves instrumental for cross-domain data synthesis from source to target domain owing to their shared label space. Thus, source domain labels can be paired with those generated pseudo target data for training UDA. However, from the UDA perspective, there exists several challenges for DM training: (i) ground-truth labels from target domain are missing; (ii) the prompt generator may produce vague or noisy descriptions of images from adverse weathers; (iii) existing arts often struggle to well handle the complex scene structure and geometry of urban scenes when conditioned only on semantic labels. To tackle the above issues, we propose ControlUDA, a diffusion-assisted framework tailored for UDA segmentation under adverse weather conditions. It first leverages target prior from a pre-trained segmentor for tuning the DM, compensating the missing target domain labels; It also contains UDAControlNet, a condition-fused multi-scale and prompt-enhanced network targeted at high-fidelity data generation in adverse weathers. Training UDA with our generated data brings the model performances to a new milestone (72.0 mIoU) on the popular Cityscapes-to-ACDC benchmark for adverse weathers. Furthermore, ControlUDA helps to achieve good model generalizability on unseen data.

ControlUDA: Controllable Diffusion-assisted Unsupervised Domain Adaptation for Cross-Weather Semantic Segmentation

TL;DR

ControlUDA tackles unsupervised domain adaptation for semantic segmentation under adverse weather by tuning a diffusion model with target priors derived from a pretrained segmentor, enabling controllable pseudo-target data generation conditioned on source labels. The framework introduces UDAControlNet for multi-scale conditioning and structure-aware prompt-enhanced diffusion, forming a closed loop from source labels to target-like images to improved segmentation. Empirical results on Cityscapes→ACDC show substantial gains over prior SOTA, including a peak of mIoU, and ablation and generalization analyses demonstrate robustness across unseen weather and cross-dataset settings. This approach offers a practical, scalable pathway to leverage diffusion models for targeted cross-weather UDA with strong transferability.

Abstract

Data generation is recognized as a potent strategy for unsupervised domain adaptation (UDA) pertaining semantic segmentation in adverse weathers. Nevertheless, these adverse weather scenarios encompass multiple possibilities, and high-fidelity data synthesis with controllable weather is under-researched in previous UDA works. The recent strides in large-scale text-to-image diffusion models (DM) have ushered in a novel avenue for research, enabling the generation of realistic images conditioned on semantic labels. This capability proves instrumental for cross-domain data synthesis from source to target domain owing to their shared label space. Thus, source domain labels can be paired with those generated pseudo target data for training UDA. However, from the UDA perspective, there exists several challenges for DM training: (i) ground-truth labels from target domain are missing; (ii) the prompt generator may produce vague or noisy descriptions of images from adverse weathers; (iii) existing arts often struggle to well handle the complex scene structure and geometry of urban scenes when conditioned only on semantic labels. To tackle the above issues, we propose ControlUDA, a diffusion-assisted framework tailored for UDA segmentation under adverse weather conditions. It first leverages target prior from a pre-trained segmentor for tuning the DM, compensating the missing target domain labels; It also contains UDAControlNet, a condition-fused multi-scale and prompt-enhanced network targeted at high-fidelity data generation in adverse weathers. Training UDA with our generated data brings the model performances to a new milestone (72.0 mIoU) on the popular Cityscapes-to-ACDC benchmark for adverse weathers. Furthermore, ControlUDA helps to achieve good model generalizability on unseen data.
Paper Structure (13 sections, 5 equations, 4 figures, 8 tables)

This paper contains 13 sections, 5 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: An algorithmic overview of ControlUDA framework. (a) depicts the training procedure of our UDAControlNet conditioned on prior knowledge from target domain, as described in Sec. \ref{['sec:training']}; (b) demonstrates how data sampling can be performed with our trained UDAControlNet to synthesize various pseudo target data from a single source label (Sec. \ref{['sec:inference']}); (c) shows how the performance of domain adaptive semantic segmentation in adverse weathers can be boosted via refinement training with our generated data (Sec. \ref{['sec:task']}).
  • Figure 2: Qualitative comparison of Cityscapes-to-ACDC adaptation on ACDC val set. Columns from left to right are: target domain inputs; ground-truths; segmentation predictions from DAFormer hoyer2022daformer, HRDA hoyer2022hrda, MIC hoyer2023mic and ControlUDA (ours).
  • Figure 3: Visual comparison of different generative models Given the same input semantic condition, outputs from different approaches are visualized. For OASIS, due to its limitation, we sample till the desired weather appears.
  • Figure 4: Visual ablation of UDAControlNet component. Given the same input semantic condition, we train UDAControlNet with different configurations. From (a) to (f), components are added on top of the previous one.