On Feasibility of Intent Obfuscating Attacks
Zhaobin Li, Patrick Shafto
TL;DR
This work investigates intent obfuscating attacks on object detectors, where perturbing one object disrupts another to conceal the attacker’s target. It introduces Targeted Objectness Gradient (TOG) as a gradient-based method enabling targeted and untargeted attacks across both 1- and 2-stage detectors on COCO. The study demonstrates feasibility across five detectors (YOLOv3, SSD, RetinaNet, Faster R-CNN, Cascade R-CNN) and identifies key success factors—target confidence, perturbation size, and object proximity—showing that combining factors dramatically increases success. It discusses defensive implications favoring 2-stage detectors and raises broader legal and societal questions about plausible deniability in ML systems and accountability for adversarial actions.
Abstract
Intent obfuscation is a common tactic in adversarial situations, enabling the attacker to both manipulate the target system and avoid culpability. Surprisingly, it has rarely been implemented in adversarial attacks on machine learning systems. We are the first to propose using intent obfuscation to generate adversarial examples for object detectors: by perturbing another non-overlapping object to disrupt the target object, the attacker hides their intended target. We conduct a randomized experiment on 5 prominent detectors -- YOLOv3, SSD, RetinaNet, Faster R-CNN, and Cascade R-CNN -- using both targeted and untargeted attacks and achieve success on all models and attacks. We analyze the success factors characterizing intent obfuscating attacks, including target object confidence and perturb object sizes. We then demonstrate that the attacker can exploit these success factors to increase success rates for all models and attacks. Finally, we discuss main takeaways and legal repercussions.
