Table of Contents
Fetching ...

On Feasibility of Intent Obfuscating Attacks

Zhaobin Li, Patrick Shafto

TL;DR

This work investigates intent obfuscating attacks on object detectors, where perturbing one object disrupts another to conceal the attacker’s target. It introduces Targeted Objectness Gradient (TOG) as a gradient-based method enabling targeted and untargeted attacks across both 1- and 2-stage detectors on COCO. The study demonstrates feasibility across five detectors (YOLOv3, SSD, RetinaNet, Faster R-CNN, Cascade R-CNN) and identifies key success factors—target confidence, perturbation size, and object proximity—showing that combining factors dramatically increases success. It discusses defensive implications favoring 2-stage detectors and raises broader legal and societal questions about plausible deniability in ML systems and accountability for adversarial actions.

Abstract

Intent obfuscation is a common tactic in adversarial situations, enabling the attacker to both manipulate the target system and avoid culpability. Surprisingly, it has rarely been implemented in adversarial attacks on machine learning systems. We are the first to propose using intent obfuscation to generate adversarial examples for object detectors: by perturbing another non-overlapping object to disrupt the target object, the attacker hides their intended target. We conduct a randomized experiment on 5 prominent detectors -- YOLOv3, SSD, RetinaNet, Faster R-CNN, and Cascade R-CNN -- using both targeted and untargeted attacks and achieve success on all models and attacks. We analyze the success factors characterizing intent obfuscating attacks, including target object confidence and perturb object sizes. We then demonstrate that the attacker can exploit these success factors to increase success rates for all models and attacks. Finally, we discuss main takeaways and legal repercussions.

On Feasibility of Intent Obfuscating Attacks

TL;DR

This work investigates intent obfuscating attacks on object detectors, where perturbing one object disrupts another to conceal the attacker’s target. It introduces Targeted Objectness Gradient (TOG) as a gradient-based method enabling targeted and untargeted attacks across both 1- and 2-stage detectors on COCO. The study demonstrates feasibility across five detectors (YOLOv3, SSD, RetinaNet, Faster R-CNN, Cascade R-CNN) and identifies key success factors—target confidence, perturbation size, and object proximity—showing that combining factors dramatically increases success. It discusses defensive implications favoring 2-stage detectors and raises broader legal and societal questions about plausible deniability in ML systems and accountability for adversarial actions.

Abstract

Intent obfuscation is a common tactic in adversarial situations, enabling the attacker to both manipulate the target system and avoid culpability. Surprisingly, it has rarely been implemented in adversarial attacks on machine learning systems. We are the first to propose using intent obfuscation to generate adversarial examples for object detectors: by perturbing another non-overlapping object to disrupt the target object, the attacker hides their intended target. We conduct a randomized experiment on 5 prominent detectors -- YOLOv3, SSD, RetinaNet, Faster R-CNN, and Cascade R-CNN -- using both targeted and untargeted attacks and achieve success on all models and attacks. We analyze the success factors characterizing intent obfuscating attacks, including target object confidence and perturb object sizes. We then demonstrate that the attacker can exploit these success factors to increase success rates for all models and attacks. Finally, we discuss main takeaways and legal repercussions.
Paper Structure (23 sections, 2 equations, 20 figures, 2 tables)

This paper contains 23 sections, 2 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: A vanishing attack perturbs a sandwich (dotted blue box) and causes YOLOv3 to miss the targeted bottle (no orange boxes are seen).
  • Figure 2: A mislabeling attack perturbs a sink and causes SSD to mislabel the targeted oven as a microwave with 0.96 confidence.
  • Figure 3: An untargeted attack perturbs a person and causes Faster R-CNN to miss the kite (and baseball) and hallucinate objects like bananas.
  • Figure 5: Success factors can be exploited in combination to significantly increase success rates: We sampled target and perturb objects based on three validated success factors in Table \ref{['tab:results_table']} by targeting objects with low predicted confidence, perturbing large objects and selecting target and perturb objects close to one another. The binned summaries and regression trendlines graph success proportion against number of factors in the deliberate attack experiment. Errors are 95% confidence intervals and every point aggregates success over 200 images. Success rates significantly increase as the number of factors combined increases. Significance is determined at $\alpha < 0.05$ using a Wald z-test on the logistic estimates. Full details are given in Section \ref{['sec:del_per']}.
  • Figure 6: Perturbing an arbitrary region obfuscates intent with increased success for all models and attacks: We implement intent obfuscating attack by perturbing an arbitrary non-overlapping square region to disrupt a randomly selected target object at various lengths and distances. The binned summaries and regression trendlines graph success proportion against perturb-target distance and perturb box length, both relative to image width or height, in the deliberate attack experiment. Errors are 95% confidence intervals and every point aggregates success over 200 images. The deliberate attack multiplies success as compared to the randomized attack (Figure \ref{['fig:success_trend_graph']}), especially at close perturb-target distance and large perturb box length. Full details are given in Section \ref{['sec:del_arb']}.
  • ...and 15 more figures