Table of Contents
Fetching ...

CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning

Ziyang Gong, Fuhao Li, Yupeng Deng, Deblina Bhattacharjee, Xianzheng Ma, Xiangwei Zhu, Zhenming Ji

TL;DR

CoDA addresses unsupervised domain adaptation for semantic segmentation across adverse scenes by introducing Chain-of-Domain Adaptation (CoD), which structures learning from easy to hard scenes via intermediate domains, and Severity-Aware Visual Prompt Tuning (SAVPT), which uses an image-level Severity metric to route learning through two severity branches with Meta-Visual Prompts and Meta-Adapters. The approach is reinforced by a training-free mini-dataset generator and a teacher-student loss framework that leverages pseudo-labels, yielding state-of-the-art mIoU on Foggy Driving and Foggy Zurich benchmarks and strong improvements on ACDC-based tasks. Key contributions include the first CoT-inspired CoD variant for UDA in adverse scenes, empirical validation of SAVPT’s capability to learn domain-invariant features with inference-time discardability, and substantial performance gains across multiple benchmarks. Overall, CoDA offers a lightweight, scalable path to robust perception under diverse adverse conditions, with significant implications for real-world autonomous systems.

Abstract

Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these discrepancies at scene and image levels. Specifically, CoDA consists of a Chain-of-Domain (CoD) strategy and a Severity-Aware Visual Prompt Tuning (SAVPT) mechanism. CoD focuses on scene-level instructions to divide all adverse scenes into easy and hard scenes, guiding models to adapt from source to easy domains with easy scene images, and then to hard domains with hard scene images, thereby laying a solid foundation for whole adaptations. Building upon this foundation, we employ SAVPT to dive into more detailed image-level instructions to boost performance. SAVPT features a novel metric Severity that divides all adverse scene images into low-severity and high-severity images. Then Severity directs visual prompts and adapters, instructing models to concentrate on unified severity features instead of scene-specific features, without adding complexity to the model architecture. CoDA achieves SOTA performances on widely-used benchmarks under all adverse scenes. Notably, CoDA outperforms the existing ones by 4.6%, and 10.3% mIoU on the Foggy Driving, and Foggy Zurich benchmarks, respectively. Our code is available at https://github.com/Cuzyoung/CoDA

CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning

TL;DR

CoDA addresses unsupervised domain adaptation for semantic segmentation across adverse scenes by introducing Chain-of-Domain Adaptation (CoD), which structures learning from easy to hard scenes via intermediate domains, and Severity-Aware Visual Prompt Tuning (SAVPT), which uses an image-level Severity metric to route learning through two severity branches with Meta-Visual Prompts and Meta-Adapters. The approach is reinforced by a training-free mini-dataset generator and a teacher-student loss framework that leverages pseudo-labels, yielding state-of-the-art mIoU on Foggy Driving and Foggy Zurich benchmarks and strong improvements on ACDC-based tasks. Key contributions include the first CoT-inspired CoD variant for UDA in adverse scenes, empirical validation of SAVPT’s capability to learn domain-invariant features with inference-time discardability, and substantial performance gains across multiple benchmarks. Overall, CoDA offers a lightweight, scalable path to robust perception under diverse adverse conditions, with significant implications for real-world autonomous systems.

Abstract

Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these discrepancies at scene and image levels. Specifically, CoDA consists of a Chain-of-Domain (CoD) strategy and a Severity-Aware Visual Prompt Tuning (SAVPT) mechanism. CoD focuses on scene-level instructions to divide all adverse scenes into easy and hard scenes, guiding models to adapt from source to easy domains with easy scene images, and then to hard domains with hard scene images, thereby laying a solid foundation for whole adaptations. Building upon this foundation, we employ SAVPT to dive into more detailed image-level instructions to boost performance. SAVPT features a novel metric Severity that divides all adverse scene images into low-severity and high-severity images. Then Severity directs visual prompts and adapters, instructing models to concentrate on unified severity features instead of scene-specific features, without adding complexity to the model architecture. CoDA achieves SOTA performances on widely-used benchmarks under all adverse scenes. Notably, CoDA outperforms the existing ones by 4.6%, and 10.3% mIoU on the Foggy Driving, and Foggy Zurich benchmarks, respectively. Our code is available at https://github.com/Cuzyoung/CoDA
Paper Structure (23 sections, 12 equations, 13 figures, 9 tables)

This paper contains 23 sections, 12 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: (a) Current SOTA modelshoyer2023michoyer2022hrda trained on all adverse scenes within a target domain can achieve good performance on other details but struggle to recognize the sky under night scenes. These models, typically, trained on a single night scene show the contrary results. Yellow circles in predictions denote the sky recognition and white ones indicate other classes' recognition. (b) Traditional strategy directly adapts from source to target domains with chaotic gaps. Our Chain-of-Domain (CoD) strategy instructs models to adapt from source to target domains according to the difficulties of scenes through introducing intermediate domains.
  • Figure 2: Four Stages Training-free Pipeline generates the mini-dataset to serve as intermediate steps providing diverse difficulty levels. All images are adverse scene images with slight weather factors generated based on ACDC adverse scene images and possess the common features of target and source images. Notably, stages 1, 2, and 4 contain manual processes based on human feedback.
  • Figure 3: (a) shows the data composition in experiments of Cityscapes to ACDC. (b) demonstrates the visualization of SPT and Meta-Visual Prompts. The purple pixels are severe pixels and the green pixels are the nonsevere. (c) and (d) shows the details of CoDA's architecture and pipeline.
  • Figure 4: Ablation studies on CS to ACDC val. The x and y axes respectively mean the mIoU and Iteration. The purple, green, and red lines respectively mean original models with traditional strategy, CoD strategy, and CoD$+$traditional strategy that we implement in CoDA.
  • Figure 5: Quantitative experiments between MIC, MIC trained with CoDA, and Mic trained with CoDA but without SAVPT during inference time. The results reveal that CoDA understands all scenes better and SAVPT enhances models' abilities.
  • ...and 8 more figures