Table of Contents
Fetching ...

Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation

Shen Zheng, Anurag Ghosh, Srinivasa G. Narasimhan

TL;DR

The paper tackles the challenge of unsupervised domain adaptation for driving perception under adverse conditions by mitigating background-driven variance that impedes learning from foreground objects. It introduces Instance-Warp, a training-time in-place image warping approach guided by instance-level saliency to oversample foreground objects and reduce background bias, paired with a feature unwarping step to keep predictions consistent with unwarped space. The method is agnostic to the specific downstream task, domain adaptation algorithm, saliency guidance, and model architecture, and it integrates with DAFormer and 2PCNet-based pipelines while maintaining zero test-time latency. Empirically, it yields notable improvements across domain adaptation for object detection (e.g., +6.1 mAP50 in BDD100K Clear→Dense Foggy, +3.7 mAP50 Day→Night, +3.0 mAP50 Clear→Rainy) and semantic segmentation (e.g., +6.3 mIoU Cityscapes→ACDC), with minimal training memory overhead and no additional inference cost. The approach relies on an instance-level saliency map derived from bounding boxes to steer warping intensity and demonstrates that focusing on salient foregrounds enhances backbone features and cross-domain generalization, while acknowledging limitations in densely populated scenes and certain synthetic datasets.

Abstract

Driving is challenging in conditions like night, rain, and snow. Lack of good labeled datasets has hampered progress in scene understanding under such conditions. Unsupervised Domain Adaptation (UDA) using large labeled clear-day datasets is a promising research direction in such cases. However, many UDA methods are trained with dominant scene backgrounds (e.g., roads, sky, sidewalks) that appear dramatically different across domains. As a result, they struggle to learn effective features of smaller and often sparse foreground objects (e.g., people, vehicles, signs). In this work, we improve UDA training by applying in-place image warping to focus on salient objects. We design instance-level saliency guidance to adaptively oversample object regions and undersample background areas, which reduces adverse effects from background context and enhances backbone feature learning. Our approach improves adaptation across geographies, lighting, and weather conditions, and is agnostic to the task (segmentation, detection), domain adaptation algorithm, saliency guidance, and underlying model architecture. Result highlights include +6.1 mAP50 for BDD100K Clear $\rightarrow$ DENSE Foggy, +3.7 mAP50 for BDD100K Day $\rightarrow$ Night, +3.0 mAP50 for BDD100K Clear $\rightarrow$ Rainy, and +6.3 mIoU for Cityscapes $\rightarrow$ ACDC. Besides, Our method adds minimal training memory and no additional inference latency. Code is available at https://github.com/ShenZheng2000/Instance-Warp

Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation

TL;DR

The paper tackles the challenge of unsupervised domain adaptation for driving perception under adverse conditions by mitigating background-driven variance that impedes learning from foreground objects. It introduces Instance-Warp, a training-time in-place image warping approach guided by instance-level saliency to oversample foreground objects and reduce background bias, paired with a feature unwarping step to keep predictions consistent with unwarped space. The method is agnostic to the specific downstream task, domain adaptation algorithm, saliency guidance, and model architecture, and it integrates with DAFormer and 2PCNet-based pipelines while maintaining zero test-time latency. Empirically, it yields notable improvements across domain adaptation for object detection (e.g., +6.1 mAP50 in BDD100K Clear→Dense Foggy, +3.7 mAP50 Day→Night, +3.0 mAP50 Clear→Rainy) and semantic segmentation (e.g., +6.3 mIoU Cityscapes→ACDC), with minimal training memory overhead and no additional inference cost. The approach relies on an instance-level saliency map derived from bounding boxes to steer warping intensity and demonstrates that focusing on salient foregrounds enhances backbone features and cross-domain generalization, while acknowledging limitations in densely populated scenes and certain synthetic datasets.

Abstract

Driving is challenging in conditions like night, rain, and snow. Lack of good labeled datasets has hampered progress in scene understanding under such conditions. Unsupervised Domain Adaptation (UDA) using large labeled clear-day datasets is a promising research direction in such cases. However, many UDA methods are trained with dominant scene backgrounds (e.g., roads, sky, sidewalks) that appear dramatically different across domains. As a result, they struggle to learn effective features of smaller and often sparse foreground objects (e.g., people, vehicles, signs). In this work, we improve UDA training by applying in-place image warping to focus on salient objects. We design instance-level saliency guidance to adaptively oversample object regions and undersample background areas, which reduces adverse effects from background context and enhances backbone feature learning. Our approach improves adaptation across geographies, lighting, and weather conditions, and is agnostic to the task (segmentation, detection), domain adaptation algorithm, saliency guidance, and underlying model architecture. Result highlights include +6.1 mAP50 for BDD100K Clear DENSE Foggy, +3.7 mAP50 for BDD100K Day Night, +3.0 mAP50 for BDD100K Clear Rainy, and +6.3 mIoU for Cityscapes ACDC. Besides, Our method adds minimal training memory and no additional inference latency. Code is available at https://github.com/ShenZheng2000/Instance-Warp
Paper Structure (23 sections, 8 equations, 15 figures, 18 tables)

This paper contains 23 sections, 8 equations, 15 figures, 18 tables.

Figures (15)

  • Figure 1: Object-Background Pixel Imbalance & Differences in Cross-Domain Object-Background Variations. Consider scenes from different domains: clear day, rainy, and night. red highlights foreground objects and yellow highlights background areas. Note, (a) background pixels occupy more space than foreground object pixels, (b) background elements like road exhibit higher cross-domain variations, while foreground objects like cars show smaller variations. Thus, focusing on objects would mitigate over-reliance on the background and improve domain adaptation.
  • Figure 2: (a)Differences in Cross-domain object background variations. We visualize t-SNE van2008visualizing plots for ResNet-50 features extracted from ACDC sakaridis2021acdc foggy and rainy images. Top plot shows image features (the majority of an image is background), while the bottom plot shows car features (the most common object). While image features across domains exhibit higher variance, car features are tightly entangled. Thus, over-reliance on the more variable background context makes adaptation difficult. (b)Saliency Guided Image Warping increases the relative size of salient foreground regions. Zoom in to see the car getting enlarged in the highlighted region, which reduces the effect of background context. (c)Learned Backbone features are better - they show a better focus on the object and less reliance on background.
  • Figure 3: Saliency Guided Image Warping for Unsupervised Domain Adaptation. Consider the standard UDA framework (See Sections \ref{['subsec:overview']} and \ref{['subsec:da']}) with supervised pre-training and unsupervised self-training phases (as shown here). The components of warping and unwarping are marked in cyan. We warp images using saliency guidance to oversample salient image regions, encouraging improved feature learning for the backhone. Our instance-level saliency guidance oversamples object regions, showing better performance compared to Static Prior thavamani2021foveathavamani2023learning and Geometric Prior ghosh2023learned. We unwarp features before predicting labels, ensuring that our labels are never warped, and the UDA losses of the employed algorithm (e.g., hoyer2022daformerkennerley20232pcnethoyer2022hrdahoyer2023micli2022crossdeng2021unbiasedzhang2021prototypical) remain unmodified. We do not warp or unwarp at test time.
  • Figure 4: Image Warping with Different Saliency Guidance Functions. In-place warping follows a zero-sum pixel constraint: enlarging one region necessitates shrinking another. Bounding boxes mark small, medium, and large objects. Our instance-level saliency guidance oversamples objects and undersamples the background. In contrast, Static Prior fails when object locations do not align with the dataset's average object location, while Geometric Prior fails for small objects not near the vanishing point.
  • Figure 5: Warping and Unwarping. Shown is a warped image with two different saliency scales $s = \{1, 16\}$. A higher saliency scale implies less intense warping. While unwarping is applied to features, here we apply it to the warped image for illusration. Although saliency guided warping and unwarping are lossy, we observe that the error between the original image and warped-then-unwarped image is very low, indicating minimal information loss.
  • ...and 10 more figures