Transferable Adversarial Attacks for Image and Video Object Detection
Xingxing Wei, Siyuan Liang, Ning Chen, Xiaochun Cao
TL;DR
The paper introduces Unified and Efficient Adversary (UEA), a GAN-based framework that quickly generates transferable adversarial images and video frames to attack both proposal-based and regression-based object detectors. By incorporating a multi-scale attention feature loss and a DAG-inspired misclassification objective within a conditional GAN, UEA achieves high transferability and orders-of-magnitude faster generation than prior optimization-based attacks. Experiments on PASCAL VOC and ImageNet VID demonstrate strong attack performance and efficiency for both image and video detection tasks. This work advances black-box robustness concerns for object detection and offers a practical benchmark for evaluating defenses across modalities and detector architectures.
Abstract
Adversarial examples have been demonstrated to threaten many computer vision tasks including object detection. However, the existing attacking methods for object detection have two limitations: poor transferability, which denotes that the generated adversarial examples have low success rate to attack other kinds of detection methods, and high computation cost, which means that they need more time to generate an adversarial image, and therefore are difficult to deal with the video data. To address these issues, we utilize a generative mechanism to obtain the adversarial image and video. In this way, the processing time is reduced. To enhance the transferability, we destroy the feature maps extracted from the feature network, which usually constitutes the basis of object detectors. The proposed method is based on the Generative Adversarial Network (GAN) framework, where we combine the high-level class loss and low-level feature loss to jointly train the adversarial example generator. A series of experiments conducted on PASCAL VOC and ImageNet VID datasets show that our method can efficiently generate image and video adversarial examples, and more importantly, these adversarial examples have better transferability, and thus, are able to simultaneously attack two kinds of representative object detection models: proposal based models like Faster-RCNN, and regression based models like SSD.
