Table of Contents
Fetching ...

VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models

Hui Kuurila-Zhang, Haoyu Chen, Guoying Zhao

TL;DR

VENOM introduces a text-driven diffusion-based framework for unrestricted adversarial example generation, unifying image content creation and adversarial synthesis within a single reverse diffusion process. It stabilizes adversarial guidance via a momentum-enhanced gradient and an adaptive control switch, enabling high image fidelity while maintaining strong attack success against defenses. The approach supports both NAEs (from random noise) and UAEs (from reference images) and demonstrates superior quality and competitive ASR in white-box settings, with nuanced transferability trade-offs across black-box models. This work advances adversarial robustness research by providing a flexible, purely text-driven pathway to generate realistic, effective adversarial examples and by informing defense design.

Abstract

Adversarial attacks have proven effective in deceiving machine learning models by subtly altering input images, motivating extensive research in recent years. Traditional methods constrain perturbations within $l_p$-norm bounds, but advancements in Unrestricted Adversarial Examples (UAEs) allow for more complex, generative-model-based manipulations. Diffusion models now lead UAE generation due to superior stability and image quality over GANs. However, existing diffusion-based UAE methods are limited to using reference images and face challenges in generating Natural Adversarial Examples (NAEs) directly from random noise, often producing uncontrolled or distorted outputs. In this work, we introduce VENOM, the first text-driven framework for high-quality unrestricted adversarial examples generation through diffusion models. VENOM unifies image content generation and adversarial synthesis into a single reverse diffusion process, enabling high-fidelity adversarial examples without sacrificing attack success rate (ASR). To stabilize this process, we incorporate an adaptive adversarial guidance strategy with momentum, ensuring that the generated adversarial examples $x^*$ align with the distribution $p(x)$ of natural images. Extensive experiments demonstrate that VENOM achieves superior ASR and image quality compared to prior methods, marking a significant advancement in adversarial example generation and providing insights into model vulnerabilities for improved defense development.

VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models

TL;DR

VENOM introduces a text-driven diffusion-based framework for unrestricted adversarial example generation, unifying image content creation and adversarial synthesis within a single reverse diffusion process. It stabilizes adversarial guidance via a momentum-enhanced gradient and an adaptive control switch, enabling high image fidelity while maintaining strong attack success against defenses. The approach supports both NAEs (from random noise) and UAEs (from reference images) and demonstrates superior quality and competitive ASR in white-box settings, with nuanced transferability trade-offs across black-box models. This work advances adversarial robustness research by providing a flexible, purely text-driven pathway to generate realistic, effective adversarial examples and by informing defense design.

Abstract

Adversarial attacks have proven effective in deceiving machine learning models by subtly altering input images, motivating extensive research in recent years. Traditional methods constrain perturbations within -norm bounds, but advancements in Unrestricted Adversarial Examples (UAEs) allow for more complex, generative-model-based manipulations. Diffusion models now lead UAE generation due to superior stability and image quality over GANs. However, existing diffusion-based UAE methods are limited to using reference images and face challenges in generating Natural Adversarial Examples (NAEs) directly from random noise, often producing uncontrolled or distorted outputs. In this work, we introduce VENOM, the first text-driven framework for high-quality unrestricted adversarial examples generation through diffusion models. VENOM unifies image content generation and adversarial synthesis into a single reverse diffusion process, enabling high-fidelity adversarial examples without sacrificing attack success rate (ASR). To stabilize this process, we incorporate an adaptive adversarial guidance strategy with momentum, ensuring that the generated adversarial examples align with the distribution of natural images. Extensive experiments demonstrate that VENOM achieves superior ASR and image quality compared to prior methods, marking a significant advancement in adversarial example generation and providing insights into model vulnerabilities for improved defense development.
Paper Structure (16 sections, 9 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Samples of Natural Adversarial Examples (NAEs) generated by VENOM, conditioned on input text prompts and a designated target label. VENOM achieves nearly a 100% white-box attack success rate in generating NAEs while preserving high image fidelity.
  • Figure 2: The overview of VENOM algorithm for generating NAEs (no reference images) and UAEs (with reference images). In NAE model, the input $X_T$ is sampled from the standard Gaussian distribution $\mathcal{N}(0,1)$. In UAE mode, the input $X_T$ is derived by applying the DDIM inversion (Eq. (\ref{['eq:ddim_inversion']})) to the reference image $X_0$. If the target label is unavailable, the class with the second-highest likelihood, excluding the ground truth, is assigned as the target label.
  • Figure 3: Note that most of existing adversarial attack methods can only work on given reference images, thus we generate the same reference NAEs from identical Gaussian noise and use basic class names as text prompts, facilitating a fair comparison across different attack strategies displayed in each column. The top row shows clean images produced by the stable diffusion model without adversarial perturbation, serving as a reference. Corrupted NAEs are outlined with red borders to ensure clear visual distinction.
  • Figure 4: UAEs generated with different attack methods from reference clean images. Please zoom in to compare details.
  • Figure 5: Ablation study evaluating the impact of the momentum (Mo) and adaptive control strategy (AS) modules.