Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation
Shen Zheng, Anurag Ghosh, Srinivasa G. Narasimhan
TL;DR
The paper tackles the challenge of unsupervised domain adaptation for driving perception under adverse conditions by mitigating background-driven variance that impedes learning from foreground objects. It introduces Instance-Warp, a training-time in-place image warping approach guided by instance-level saliency to oversample foreground objects and reduce background bias, paired with a feature unwarping step to keep predictions consistent with unwarped space. The method is agnostic to the specific downstream task, domain adaptation algorithm, saliency guidance, and model architecture, and it integrates with DAFormer and 2PCNet-based pipelines while maintaining zero test-time latency. Empirically, it yields notable improvements across domain adaptation for object detection (e.g., +6.1 mAP50 in BDD100K Clear→Dense Foggy, +3.7 mAP50 Day→Night, +3.0 mAP50 Clear→Rainy) and semantic segmentation (e.g., +6.3 mIoU Cityscapes→ACDC), with minimal training memory overhead and no additional inference cost. The approach relies on an instance-level saliency map derived from bounding boxes to steer warping intensity and demonstrates that focusing on salient foregrounds enhances backbone features and cross-domain generalization, while acknowledging limitations in densely populated scenes and certain synthetic datasets.
Abstract
Driving is challenging in conditions like night, rain, and snow. Lack of good labeled datasets has hampered progress in scene understanding under such conditions. Unsupervised Domain Adaptation (UDA) using large labeled clear-day datasets is a promising research direction in such cases. However, many UDA methods are trained with dominant scene backgrounds (e.g., roads, sky, sidewalks) that appear dramatically different across domains. As a result, they struggle to learn effective features of smaller and often sparse foreground objects (e.g., people, vehicles, signs). In this work, we improve UDA training by applying in-place image warping to focus on salient objects. We design instance-level saliency guidance to adaptively oversample object regions and undersample background areas, which reduces adverse effects from background context and enhances backbone feature learning. Our approach improves adaptation across geographies, lighting, and weather conditions, and is agnostic to the task (segmentation, detection), domain adaptation algorithm, saliency guidance, and underlying model architecture. Result highlights include +6.1 mAP50 for BDD100K Clear $\rightarrow$ DENSE Foggy, +3.7 mAP50 for BDD100K Day $\rightarrow$ Night, +3.0 mAP50 for BDD100K Clear $\rightarrow$ Rainy, and +6.3 mIoU for Cityscapes $\rightarrow$ ACDC. Besides, Our method adds minimal training memory and no additional inference latency. Code is available at https://github.com/ShenZheng2000/Instance-Warp
