CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
Ziyang Gong, Fuhao Li, Yupeng Deng, Deblina Bhattacharjee, Xianzheng Ma, Xiangwei Zhu, Zhenming Ji
TL;DR
CoDA addresses unsupervised domain adaptation for semantic segmentation across adverse scenes by introducing Chain-of-Domain Adaptation (CoD), which structures learning from easy to hard scenes via intermediate domains, and Severity-Aware Visual Prompt Tuning (SAVPT), which uses an image-level Severity metric to route learning through two severity branches with Meta-Visual Prompts and Meta-Adapters. The approach is reinforced by a training-free mini-dataset generator and a teacher-student loss framework that leverages pseudo-labels, yielding state-of-the-art mIoU on Foggy Driving and Foggy Zurich benchmarks and strong improvements on ACDC-based tasks. Key contributions include the first CoT-inspired CoD variant for UDA in adverse scenes, empirical validation of SAVPT’s capability to learn domain-invariant features with inference-time discardability, and substantial performance gains across multiple benchmarks. Overall, CoDA offers a lightweight, scalable path to robust perception under diverse adverse conditions, with significant implications for real-world autonomous systems.
Abstract
Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these discrepancies at scene and image levels. Specifically, CoDA consists of a Chain-of-Domain (CoD) strategy and a Severity-Aware Visual Prompt Tuning (SAVPT) mechanism. CoD focuses on scene-level instructions to divide all adverse scenes into easy and hard scenes, guiding models to adapt from source to easy domains with easy scene images, and then to hard domains with hard scene images, thereby laying a solid foundation for whole adaptations. Building upon this foundation, we employ SAVPT to dive into more detailed image-level instructions to boost performance. SAVPT features a novel metric Severity that divides all adverse scene images into low-severity and high-severity images. Then Severity directs visual prompts and adapters, instructing models to concentrate on unified severity features instead of scene-specific features, without adding complexity to the model architecture. CoDA achieves SOTA performances on widely-used benchmarks under all adverse scenes. Notably, CoDA outperforms the existing ones by 4.6%, and 10.3% mIoU on the Foggy Driving, and Foggy Zurich benchmarks, respectively. Our code is available at https://github.com/Cuzyoung/CoDA
