Table of Contents
Fetching ...

Adversarial Defense Teacher for Cross-Domain Object Detection under Poor Visibility Conditions

Kaiwen Wang, Yinzhe Shen, Martin Lauer

TL;DR

A Zoom-in Zoom-out strategy, which zooms-in images for better pseudo-labels and zooms-out images and pseudo-labels to learn refined features to address small objects under poor visibility conditions, is proposed.

Abstract

Existing object detectors encounter challenges in handling domain shifts between training and real-world data, particularly under poor visibility conditions like fog and night. Cutting-edge cross-domain object detection methods use teacher-student frameworks and compel teacher and student models to produce consistent predictions under weak and strong augmentations, respectively. In this paper, we reveal that manually crafted augmentations are insufficient for optimal teaching and present a simple yet effective framework named Adversarial Defense Teacher (ADT), leveraging adversarial defense to enhance teaching quality. Specifically, we employ adversarial attacks, encouraging the model to generalize on subtly perturbed inputs that effectively deceive the model. To address small objects under poor visibility conditions, we propose a Zoom-in Zoom-out strategy, which zooms-in images for better pseudo-labels and zooms-out images and pseudo-labels to learn refined features. Our results demonstrate that ADT achieves superior performance, reaching 54.5% mAP on Foggy Cityscapes, surpassing the previous state-of-the-art by 2.6% mAP.

Adversarial Defense Teacher for Cross-Domain Object Detection under Poor Visibility Conditions

TL;DR

A Zoom-in Zoom-out strategy, which zooms-in images for better pseudo-labels and zooms-out images and pseudo-labels to learn refined features to address small objects under poor visibility conditions, is proposed.

Abstract

Existing object detectors encounter challenges in handling domain shifts between training and real-world data, particularly under poor visibility conditions like fog and night. Cutting-edge cross-domain object detection methods use teacher-student frameworks and compel teacher and student models to produce consistent predictions under weak and strong augmentations, respectively. In this paper, we reveal that manually crafted augmentations are insufficient for optimal teaching and present a simple yet effective framework named Adversarial Defense Teacher (ADT), leveraging adversarial defense to enhance teaching quality. Specifically, we employ adversarial attacks, encouraging the model to generalize on subtly perturbed inputs that effectively deceive the model. To address small objects under poor visibility conditions, we propose a Zoom-in Zoom-out strategy, which zooms-in images for better pseudo-labels and zooms-out images and pseudo-labels to learn refined features. Our results demonstrate that ADT achieves superior performance, reaching 54.5% mAP on Foggy Cityscapes, surpassing the previous state-of-the-art by 2.6% mAP.
Paper Structure (27 sections, 10 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Current self-training methods enforce consistent predictions on original and strongly augmented data. However, this can be insufficient, as manual augmentation visibly alters the appearance while the model maintains similar predictions as on the original data. In contrast, we add an additional adversarial perturbation to the augmented data, which remains imperceptible to humans (thus within the same domain), but effectively deceives the model. The adversarial attack induces highly inconsistent predictions, thereby improving the mutual learning quality. Green boxes denote true positives while yellow boxes indicate misclassifications. Best viewed in color.
  • Figure 2: Overview of the proposed Adversarial Defense Teacher. Our model includes two branches: 1) supervised branch (blue lines): strongly augmented source data is fed into the student model. 2) unsupervised branch (orange lines): the teacher model processes weakly augmented and zoomed-in data to generate pseudo-labels with high confidence. Adversarial attacks (dashed lines) are conducted on the student model based on the inconsistency loss $\mathcal{L}_{\text{attack}}$ between pseudo-labels and predictions on strongly augmented and zoomed-out data. The resulting adversarial examples are reintroduced to the student model. Best viewed in color.
  • Figure 3: Conducting adversarial attacks based on various losses leads to different deceptions. Green boxes denote true positives while yellow boxes indicate misclassifications. Best viewed in color.
  • Figure 4: Qualitative results of foggy adaptation for source model (left column), CMT (middle column) and Ours (right column). Green, red and orange boxes denote true positives, false negatives and false positives, respectively. We set the score threshold to 0.8 and evaluate all models on images resized to a shorter side of 600 pixels.