Detector Collapse: Physical-World Backdooring Object Detection to Catastrophic Overload or Blindness in Autonomous Driving
Hangtao Zhang, Shengshan Hu, Yichen Wang, Leo Yu Zhang, Ziqi Zhou, Xianlong Wang, Yanjun Zhang, Chao Chen
TL;DR
Detector Collapse (DC) introduces a novel backdoor paradigm for object detection that can cause catastrophic detector failure when triggered, by attacking both the regression and classification branches. It presents two strategies, Sponge and Blinding, and a poisoning scheme using natural semantic triggers to enable physical-world activation, optimized via MGDA without altering labels. Across MS-COCO and VOC, DC outperforms state-of-the-art OD backdoors on detectors such as YOLOv5-s and Faster R-CNN, achieving near-complete degradation of $mAP$ on poisoned data and substantial inference slowdowns. The work further demonstrates physical-world viability using dynamic triggers from diffusion models, underscoring practical risks and the need for robust defenses against universal, real-world OD backdoors.
Abstract
Object detection tasks, crucial in safety-critical systems like autonomous driving, focus on pinpointing object locations. These detectors are known to be susceptible to backdoor attacks. However, existing backdoor techniques have primarily been adapted from classification tasks, overlooking deeper vulnerabilities specific to object detection. This paper is dedicated to bridging this gap by introducing Detector Collapse} (DC), a brand-new backdoor attack paradigm tailored for object detection. DC is designed to instantly incapacitate detectors (i.e., severely impairing detector's performance and culminating in a denial-of-service). To this end, we develop two innovative attack schemes: Sponge for triggering widespread misidentifications and Blinding for rendering objects invisible. Remarkably, we introduce a novel poisoning strategy exploiting natural objects, enabling DC to act as a practical backdoor in real-world environments. Our experiments on different detectors across several benchmarks show a significant improvement ($\sim$10\%-60\% absolute and $\sim$2-7$\times$ relative) in attack efficacy over state-of-the-art attacks.
