Table of Contents
Fetching ...

Out-of-the-box: Black-box Causal Attacks on Object Detectors

Melane Navaratnarajah, David A. Kelly, Hana Chockler

TL;DR

This work introduces BlackCAtt, a black-box causal attacking framework for object detectors that leverages minimal sufficient pixel sets (MSPSs) to disrupt detections. By combining MSPS-aware perturbations with saliency maps from XAI tools, BlackCAtt acts without access to internal model details and remains architecture-agnostic, demonstrating effectiveness across YOLO, Faster R-CNN, and RT-DETR. The authors show that many causal pixels reside outside bounding boxes, enabling imperceptible perturbations that reliably remove, alter, or add detections while maintaining low perceptual distortion. Empirically, BlackCAtt outperforms undirected baselines by up to 2.3x–5.75x across multiple attack goals on COCO, with MoG-based perturbations often yielding strongest label changes, and with robustness varying by detector architecture. The work offers a principled, explainable attack paradigm and highlights avenues for defense and further study of cross-model generalization and multi-detection scenarios.

Abstract

Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box and architecture specific. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and analyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and a tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. BlackCAtt combines causal pixels with bounding boxes produced by object detectors to create adversarial attacks that lead to the loss, modification or addition of a bounding box. BlackCAtt works across different object detectors of different sizes and architectures, treating the detector as a black box. We compare the performance of BlackCAtt with other black-box attack methods and show that identification of causal pixels leads to more precisely targeted and less perceptible attacks. On the COCO test dataset, our approach is 2.7 times better than the baseline in removing a detection, 3.86 times better in changing a detection, and 5.75 times better in triggering new, spurious, detections. The attacks generated by BlackCAtt are very close to the original image, and hence imperceptible, demonstrating the power of causal pixels.

Out-of-the-box: Black-box Causal Attacks on Object Detectors

TL;DR

This work introduces BlackCAtt, a black-box causal attacking framework for object detectors that leverages minimal sufficient pixel sets (MSPSs) to disrupt detections. By combining MSPS-aware perturbations with saliency maps from XAI tools, BlackCAtt acts without access to internal model details and remains architecture-agnostic, demonstrating effectiveness across YOLO, Faster R-CNN, and RT-DETR. The authors show that many causal pixels reside outside bounding boxes, enabling imperceptible perturbations that reliably remove, alter, or add detections while maintaining low perceptual distortion. Empirically, BlackCAtt outperforms undirected baselines by up to 2.3x–5.75x across multiple attack goals on COCO, with MoG-based perturbations often yielding strongest label changes, and with robustness varying by detector architecture. The work offers a principled, explainable attack paradigm and highlights avenues for defense and further study of cross-model generalization and multi-detection scenarios.

Abstract

Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box and architecture specific. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and analyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and a tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. BlackCAtt combines causal pixels with bounding boxes produced by object detectors to create adversarial attacks that lead to the loss, modification or addition of a bounding box. BlackCAtt works across different object detectors of different sizes and architectures, treating the detector as a black box. We compare the performance of BlackCAtt with other black-box attack methods and show that identification of causal pixels leads to more precisely targeted and less perceptible attacks. On the COCO test dataset, our approach is 2.7 times better than the baseline in removing a detection, 3.86 times better in changing a detection, and 5.75 times better in triggering new, spurious, detections. The attacks generated by BlackCAtt are very close to the original image, and hence imperceptible, demonstrating the power of causal pixels.

Paper Structure

This paper contains 15 sections, 4 equations, 14 figures, 13 tables, 3 algorithms.

Figures (14)

  • Figure 1: The MSPS for cat (\ref{['fig:cat:msps']}) reveals a dependency on the surrounding context. BlackCAtt starts with causal pixels outside of the bounding box and works inwards in order to maximize imperceptibility. In both \ref{['fig:cat:blur', 'fig:cat:black']} the cat is still clearly present and complete, but yolo no longer detects the cat. The attack works because BlackCAtt changes part of the cause of the detection.
  • Figure 2: The DC between bounding box and MSPS stays almost constant on the COCO dataset, regardless of yolo confidence.
  • Figure 3: Causally explainable adversarial attacks on cake. Even though the bounding box takes up the majority of the image (\ref{['fig:cake:main']}), it is enough to perturb a small number of pixels outside the box in order to remove the detection. These pixels are part of the MSPS. In particular, the gaussian blur attack in \ref{['fig:cake:blur']} is imperceptible.
  • Figure 4: Example of a trial in $\text{BlackCAtt}_{MoG}$. From top-left to bottom-right: original image overlaid with the responsibility for inside-MSPS and bbox, the top 7 peaks extracted, fitted MoG mask and, finally, the attacked image with no detection.
  • Figure 5: Success rate of different approaches in adding new spurious detection, with different models on COCO dataset, for different thresholds of L$_2$ norm. The different techniques are noise, targeted noise, blended, DRISE$_{MoG}$ and MoG.
  • ...and 9 more figures