Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack
Chenhe Gu, Jindong Gu, Andong Hua, Yao Qin
TL;DR
Multimodal Large Language Models (MLLMs) are vulnerable to adversarial inputs, yet adversarial transfer across models remains weak, especially for targeted outputs. The authors propose Dynamic Vision-Language Alignment (DynVLA), a method that dynamically perturbs the attention in the vision-language connector using a Gaussian kernel to diversify vision-language modality alignments and improve cross-model transferability. DynVLA operates within a PGD framework, perturbing attention around randomly selected image tokens with a kernel size $m \in \{3,5\}$ and perturbation budget $\epsilon = 16/255$, and is effective across open-source models (BLIP2, InstructBLIP, MiniGPT4, LLaVA) and even shows influence on closed models like Gemini. Ablation studies show kernel size and budget influence transferability, suggesting that perturbing alignment rather than raw pixels yields stronger cross-model vulnerability. The work underscores robustness challenges for real-world deployment and points to defense strategies and future research directions in securing multimodal AI systems.
Abstract
Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained attention for their capabilities in image recognition and understanding. However, while MLLMs are vulnerable to adversarial attacks, the transferability of these attacks across different models remains limited, especially under targeted attack setting. Existing methods primarily focus on vision-specific perturbations but struggle with the complex nature of vision-language modality alignment. In this work, we introduce the Dynamic Vision-Language Alignment (DynVLA) Attack, a novel approach that injects dynamic perturbations into the vision-language connector to enhance generalization across diverse vision-language alignment of different models. Our experimental results show that DynVLA significantly improves the transferability of adversarial examples across various MLLMs, including BLIP2, InstructBLIP, MiniGPT4, LLaVA, and closed-source models such as Gemini.
