A Generative Victim Model for Segmentation
Aixuan Li, Jing Zhang, Jiawei Shi, Yiran Zhong, Yuchao Dai
TL;DR
This paper addresses adversarial attacks for segmentation without relying on a task-specific victim model by introducing a generative, diffusion-score–based victim model derived from image generation. It defines three diffusion scores and, crucially, a weighted conditional score: $s(y|x^{adv}) = \omega\bigl(s_\theta(x^{adv}|y) - s_\theta(x^{adv})\bigr)$, to guide perturbation generation without direct access to segmentation gradients. A UNet-based conditional diffusion network is trained to estimate both conditional and unconditional scores, enabling effective, transfer-friendly attacks on COD and semantic segmentation with or without queries. The results show competitive transferability across diverse backbones and tasks, highlighting a data-distribution–driven, task-agnostic approach with practical query flexibility, while acknowledging a limit in correlating score estimates with classification errors for multi-class settings. This work broadens adversarial attack strategies by leveraging generative-score guidance, potentially informing defense design and robust evaluation in real-world systems.
Abstract
We find that the well-trained victim models (VMs), against which the attacks are generated, serve as fundamental prerequisites for adversarial attacks, i.e. a segmentation VM is needed to generate attacks for segmentation. In this context, the victim model is assumed to be robust to achieve effective adversarial perturbation generation. Instead of focusing on improving the robustness of the task-specific victim models, we shift our attention to image generation. From an image generation perspective, we derive a novel VM for segmentation, aiming to generate adversarial perturbations for segmentation tasks without requiring models explicitly designed for image segmentation. Our approach to adversarial attack generation diverges from conventional white-box or black-box attacks, offering a fresh outlook on adversarial attack strategies. Experiments show that our attack method is able to generate effective adversarial attacks with good transferability.
