Table of Contents
Fetching ...

Transferable Adversarial Attacks for Image and Video Object Detection

Xingxing Wei, Siyuan Liang, Ning Chen, Xiaochun Cao

TL;DR

The paper introduces Unified and Efficient Adversary (UEA), a GAN-based framework that quickly generates transferable adversarial images and video frames to attack both proposal-based and regression-based object detectors. By incorporating a multi-scale attention feature loss and a DAG-inspired misclassification objective within a conditional GAN, UEA achieves high transferability and orders-of-magnitude faster generation than prior optimization-based attacks. Experiments on PASCAL VOC and ImageNet VID demonstrate strong attack performance and efficiency for both image and video detection tasks. This work advances black-box robustness concerns for object detection and offers a practical benchmark for evaluating defenses across modalities and detector architectures.

Abstract

Adversarial examples have been demonstrated to threaten many computer vision tasks including object detection. However, the existing attacking methods for object detection have two limitations: poor transferability, which denotes that the generated adversarial examples have low success rate to attack other kinds of detection methods, and high computation cost, which means that they need more time to generate an adversarial image, and therefore are difficult to deal with the video data. To address these issues, we utilize a generative mechanism to obtain the adversarial image and video. In this way, the processing time is reduced. To enhance the transferability, we destroy the feature maps extracted from the feature network, which usually constitutes the basis of object detectors. The proposed method is based on the Generative Adversarial Network (GAN) framework, where we combine the high-level class loss and low-level feature loss to jointly train the adversarial example generator. A series of experiments conducted on PASCAL VOC and ImageNet VID datasets show that our method can efficiently generate image and video adversarial examples, and more importantly, these adversarial examples have better transferability, and thus, are able to simultaneously attack two kinds of representative object detection models: proposal based models like Faster-RCNN, and regression based models like SSD.

Transferable Adversarial Attacks for Image and Video Object Detection

TL;DR

The paper introduces Unified and Efficient Adversary (UEA), a GAN-based framework that quickly generates transferable adversarial images and video frames to attack both proposal-based and regression-based object detectors. By incorporating a multi-scale attention feature loss and a DAG-inspired misclassification objective within a conditional GAN, UEA achieves high transferability and orders-of-magnitude faster generation than prior optimization-based attacks. Experiments on PASCAL VOC and ImageNet VID demonstrate strong attack performance and efficiency for both image and video detection tasks. This work advances black-box robustness concerns for object detection and offers a practical benchmark for evaluating defenses across modalities and detector architectures.

Abstract

Adversarial examples have been demonstrated to threaten many computer vision tasks including object detection. However, the existing attacking methods for object detection have two limitations: poor transferability, which denotes that the generated adversarial examples have low success rate to attack other kinds of detection methods, and high computation cost, which means that they need more time to generate an adversarial image, and therefore are difficult to deal with the video data. To address these issues, we utilize a generative mechanism to obtain the adversarial image and video. In this way, the processing time is reduced. To enhance the transferability, we destroy the feature maps extracted from the feature network, which usually constitutes the basis of object detectors. The proposed method is based on the Generative Adversarial Network (GAN) framework, where we combine the high-level class loss and low-level feature loss to jointly train the adversarial example generator. A series of experiments conducted on PASCAL VOC and ImageNet VID datasets show that our method can efficiently generate image and video adversarial examples, and more importantly, these adversarial examples have better transferability, and thus, are able to simultaneously attack two kinds of representative object detection models: proposal based models like Faster-RCNN, and regression based models like SSD.

Paper Structure

This paper contains 16 sections, 5 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: An example of the comparisons between DAG (Dense Adversary Generation) and our UEA (Unified and Efficient Adversary) against proposal and regression based detectors. In the first row, Faster-RCNN and SSD300 detect the correct objects. The second row lists the adversarial examples from DAG. We see it succeeds to attack Faster-RCNN, but fails to attack SSD300. In this third row, neither Faster-RCNN nor SSD300 detects the cars on the adversarial images. Moreover, the UEA's processing time is almost 1000 times faster than DAG for generating an adversarial image.
  • Figure 2: The training framework of Unified and Efficient Adversary (UEA). Besides the GAN loss and similarity loss, we formulate DAG's high-level class loss and our low-level multi-scale attention feature loss into GAN framework to jointly train a generator. In the testing phase, the generator is used to output adversarial images or video frames to fool the different classes of object detectors (blue dashed box).