AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang; Junhong Ye; Xingjun Ma; Yige Li; Yunfan Yang; Yunhao Chen; Jitao Sang; Dit-Yan Yeung

AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Yunhao Chen, Jitao Sang, Dit-Yan Yeung

TL;DR

Vision-Language Models (VLMs) are vulnerable to image-based adversarial attacks, and existing targeted methods require labeled supervision, limiting scalability. AnyAttack introduces a self-supervised framework that pre-trains a noise generator on the unlabeled LAION-400M dataset and then fine-tunes it on downstream tasks, enabling any image to be turned into an adversarial target for multiple VLMs. The approach demonstrates strong attacks across five open-source VLMs and transfers to four commercial systems, highlighting a broad and urgent security risk. This work significantly widens the attack surface of VLMs and motivates the development of robust defenses against scalable, self-supervised targeted attacks.

Abstract

Due to their multimodal capabilities, Vision-Language Models (VLMs) have found numerous impactful applications in real-world scenarios. However, recent studies have revealed that VLMs are vulnerable to image-based adversarial attacks. Traditional targeted adversarial attacks require specific targets and labels, limiting their real-world impact.We present AnyAttack, a self-supervised framework that transcends the limitations of conventional attacks through a novel foundation model approach. By pre-training on the massive LAION-400M dataset without label supervision, AnyAttack achieves unprecedented flexibility - enabling any image to be transformed into an attack vector targeting any desired output across different VLMs.This approach fundamentally changes the threat landscape, making adversarial capabilities accessible at an unprecedented scale. Our extensive validation across five open-source VLMs (CLIP, BLIP, BLIP2, InstructBLIP, and MiniGPT-4) demonstrates AnyAttack's effectiveness across diverse multimodal tasks. Most concerning, AnyAttack seamlessly transfers to commercial systems including Google Gemini, Claude Sonnet, Microsoft Copilot and OpenAI GPT, revealing a systemic vulnerability requiring immediate attention.

AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

TL;DR

Abstract

Paper Structure (29 sections, 4 equations, 5 figures, 5 tables)

This paper contains 29 sections, 4 equations, 5 figures, 5 tables.

Introduction
Related Work
Targeted Adversarial Attacks.
Jailbreak Attacks on VLMs.
Adversarial Attacks on VLMs.
Proposed Attack
Preliminaries and Adversary's Settings
Threat Model.
AnyAttack
Framework Overview.
Self-supervised Adversarial Noise Pre-training
Self-supervised Adversarial Noise Fine-tuning
Experiments
Experimental Setup
Baselines.
...and 14 more sections

Figures (5)

Figure 1: Comparison of existing targeted adversarial attack strategies and the our proposed self-supervised method - AnyAttack.
Figure 2: Overview of the proposed AnyAttack: a self-supervised framework consisting of pre-training and fine-tuning stages.
Figure 3: Example responses from commercial VLMs to targeted attacks generated by our method.
Figure 4: Performance comparison between different configurations of AnyAttack for the image-text retrieval task on MSCOCO. The plot shows the comparative performance of decoder initialized from scratch (Scratch), pre-trained (Pre), and fine-tuned (Cos and Bi), alongside the impact of auxiliary models (w/ Aux) and different fine-tuning objectives (Cos or Bi) on retrieval tasks.
Figure 5: Comparison of memory usage and time consumption across different methods.

AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

TL;DR

Abstract

AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)