Table of Contents
Fetching ...

QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models

Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, Yu Wang

TL;DR

QAVA introduces a query-agnostic adversarial attack on LVLMs by perturbing the input image to disrupt the visual-language alignment module, using randomly sampled, image-irrelevant questions to drive the perturbation. By focusing on the Q-former outputs and optimizing the proposed $\mathcal{L}_{\text{QAVA}}$ loss, the method achieves strong white-box and black-box performance and demonstrates substantial transferability across LVLMs and tasks, with notable efficiency gains over end-to-end attacks. The results reveal a practical security vulnerability in LVLMs and point to defense strategies such as alignment-module adversarial training and input-output gating for image-irrelevant queries. The work underscores the need for robust multimodal alignment in LVLMs and highlights potential risks for data poisoning and model deployment in real-world scenarios.

Abstract

In typical multimodal tasks, such as Visual Question Answering (VQA), adversarial attacks targeting a specific image and question can lead large vision-language models (LVLMs) to provide incorrect answers. However, it is common for a single image to be associated with multiple questions, and LVLMs may still answer other questions correctly even for an adversarial image attacked by a specific question. To address this, we introduce the query-agnostic visual attack (QAVA), which aims to create robust adversarial examples that generate incorrect responses to unspecified and unknown questions. Compared to traditional adversarial attacks focused on specific images and questions, QAVA significantly enhances the effectiveness and efficiency of attacks on images when the question is unknown, achieving performance comparable to attacks on known target questions. Our research broadens the scope of visual adversarial attacks on LVLMs in practical settings, uncovering previously overlooked vulnerabilities, particularly in the context of visual adversarial threats. The code is available at https://github.com/btzyd/qava.

QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models

TL;DR

QAVA introduces a query-agnostic adversarial attack on LVLMs by perturbing the input image to disrupt the visual-language alignment module, using randomly sampled, image-irrelevant questions to drive the perturbation. By focusing on the Q-former outputs and optimizing the proposed loss, the method achieves strong white-box and black-box performance and demonstrates substantial transferability across LVLMs and tasks, with notable efficiency gains over end-to-end attacks. The results reveal a practical security vulnerability in LVLMs and point to defense strategies such as alignment-module adversarial training and input-output gating for image-irrelevant queries. The work underscores the need for robust multimodal alignment in LVLMs and highlights potential risks for data poisoning and model deployment in real-world scenarios.

Abstract

In typical multimodal tasks, such as Visual Question Answering (VQA), adversarial attacks targeting a specific image and question can lead large vision-language models (LVLMs) to provide incorrect answers. However, it is common for a single image to be associated with multiple questions, and LVLMs may still answer other questions correctly even for an adversarial image attacked by a specific question. To address this, we introduce the query-agnostic visual attack (QAVA), which aims to create robust adversarial examples that generate incorrect responses to unspecified and unknown questions. Compared to traditional adversarial attacks focused on specific images and questions, QAVA significantly enhances the effectiveness and efficiency of attacks on images when the question is unknown, achieving performance comparable to attacks on known target questions. Our research broadens the scope of visual adversarial attacks on LVLMs in practical settings, uncovering previously overlooked vulnerabilities, particularly in the context of visual adversarial threats. The code is available at https://github.com/btzyd/qava.

Paper Structure

This paper contains 24 sections, 2 equations, 4 figures, 18 tables, 1 algorithm.

Figures (4)

  • Figure 1: Traditional adversarial attacks involve inputting an image $x_i$ and a specified target question $x_{t,target}$ into LVLMs, with adversarial images generated through gradient-based methods. This approach typically results in incorrect answers for $x_i$ and $x_{t,target}$ (i.e., Q1). However, for other questions $x_{t,other}\in\{x_{t,other}\in\mathcal{T}|x_{t,other}\neq x_{t,target}\}$ within the question set $\mathcal{T}$ that are not the same as the $x_{t,other}$, it remains possible for LVLMs to provide correct answers (i.e., Q2-Q6). Our QAVA samples a set of questions $x_\text{t,QAVA}$ and performs attacks on these questions, even if they are unrelated to the original image $x_i$. QAVA generates adversarial images that are likely to produce incorrect responses when faced with unknown target questions.
  • Figure 2: The framework of QAVA is structured as follows: Initially, we generate $N$ randomly sampled questions, denoted as $x_\text{t,QAVA}$, which are not pertinent to the input image $x_i$. Subsequently, we introduce random perturbations to $x_i$ to create the initial variant, $x'_\text{i,QAVA}$. Both $x_i$ and $x_\text{t,QAVA}$ are then input into the LVLM, and the LVLM's response serves as a label. Despite the fact that the question $x_\text{t,QAVA}$ is unrelated to the image $x_i$, the LVLM still provides a response. Following this, we input $x'_\text{i,QAVA}$ and $x_\text{t,QAVA}$ into the LVLM to calculate the MSE loss based on the Q-former output features. Adversarial attacks are executed using techniques such as PGD or C&W by employing the loss functions, denoted as $\mathcal{L}_\text{QAVA}$. The traditional end-to-end attack loss function, $\mathcal{L}_\text{LLM}$, is also shown.
  • Figure 3: Visualization of image imperceptibility.
  • Figure 4: The clean images, the QAVA adversarial images, and the QAVA+SSAH adversarial images. All experiments are conducted using InstructBLIP Vicuna-7B with the attack $\mathcal{L}_\text{QAVA}$($\text{RSQ}_\text{10}$).