Out-of-the-box: Black-box Causal Attacks on Object Detectors
Melane Navaratnarajah, David A. Kelly, Hana Chockler
TL;DR
This work introduces BlackCAtt, a black-box causal attacking framework for object detectors that leverages minimal sufficient pixel sets (MSPSs) to disrupt detections. By combining MSPS-aware perturbations with saliency maps from XAI tools, BlackCAtt acts without access to internal model details and remains architecture-agnostic, demonstrating effectiveness across YOLO, Faster R-CNN, and RT-DETR. The authors show that many causal pixels reside outside bounding boxes, enabling imperceptible perturbations that reliably remove, alter, or add detections while maintaining low perceptual distortion. Empirically, BlackCAtt outperforms undirected baselines by up to 2.3x–5.75x across multiple attack goals on COCO, with MoG-based perturbations often yielding strongest label changes, and with robustness varying by detector architecture. The work offers a principled, explainable attack paradigm and highlights avenues for defense and further study of cross-model generalization and multi-detection scenarios.
Abstract
Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box and architecture specific. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and analyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and a tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. BlackCAtt combines causal pixels with bounding boxes produced by object detectors to create adversarial attacks that lead to the loss, modification or addition of a bounding box. BlackCAtt works across different object detectors of different sizes and architectures, treating the detector as a black box. We compare the performance of BlackCAtt with other black-box attack methods and show that identification of causal pixels leads to more precisely targeted and less perceptible attacks. On the COCO test dataset, our approach is 2.7 times better than the baseline in removing a detection, 3.86 times better in changing a detection, and 5.75 times better in triggering new, spurious, detections. The attacks generated by BlackCAtt are very close to the original image, and hence imperceptible, demonstrating the power of causal pixels.
