Table of Contents
Fetching ...

Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models

Jonggyu Jang, Hyeonsu Lyu, Jungyeon Koh, Hyun Jong Yang

TL;DR

In this work, a novel adversarial attack procedure -- namely, Replace-then-Perturb and a contrastive learning-based adversarial loss -- namely, Contrastive-Adv are proposed and designed to obtain better adversarial examples against VLMs.

Abstract

The conventional targeted adversarial attacks add a small perturbation to an image to make neural network models estimate the image as a predefined target class, even if it is not the correct target class. Recently, for visual-language models (VLMs), the focus of targeted adversarial attacks is to generate a perturbation that makes VLMs answer intended target text outputs. For example, they aim to make a small perturbation on an image to make VLMs' answers change from "there is an apple" to "there is a baseball." However, answering just intended text outputs is insufficient for tricky questions like "if there is a baseball, tell me what is below it." This is because the target of the adversarial attacks does not consider the overall integrity of the original image, thereby leading to a lack of visual reasoning. In this work, we focus on generating targeted adversarial examples with visual reasoning against VLMs. To this end, we propose 1) a novel adversarial attack procedure -- namely, Replace-then-Perturb and 2) a contrastive learning-based adversarial loss -- namely, Contrastive-Adv. In Replace-then-Perturb, we first leverage a text-guided segmentation model to find the target object in the image. Then, we get rid of the target object and inpaint the empty space with the desired prompt. By doing this, we can generate a target image corresponding to the desired prompt, while maintaining the overall integrity of the original image. Furthermore, in Contrastive-Adv, we design a novel loss function to obtain better adversarial examples. Our extensive benchmark results demonstrate that Replace-then-Perturb and Contrastive-Adv outperform the baseline adversarial attack algorithms. We note that the source code to reproduce the results will be available.

Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models

TL;DR

In this work, a novel adversarial attack procedure -- namely, Replace-then-Perturb and a contrastive learning-based adversarial loss -- namely, Contrastive-Adv are proposed and designed to obtain better adversarial examples against VLMs.

Abstract

The conventional targeted adversarial attacks add a small perturbation to an image to make neural network models estimate the image as a predefined target class, even if it is not the correct target class. Recently, for visual-language models (VLMs), the focus of targeted adversarial attacks is to generate a perturbation that makes VLMs answer intended target text outputs. For example, they aim to make a small perturbation on an image to make VLMs' answers change from "there is an apple" to "there is a baseball." However, answering just intended text outputs is insufficient for tricky questions like "if there is a baseball, tell me what is below it." This is because the target of the adversarial attacks does not consider the overall integrity of the original image, thereby leading to a lack of visual reasoning. In this work, we focus on generating targeted adversarial examples with visual reasoning against VLMs. To this end, we propose 1) a novel adversarial attack procedure -- namely, Replace-then-Perturb and 2) a contrastive learning-based adversarial loss -- namely, Contrastive-Adv. In Replace-then-Perturb, we first leverage a text-guided segmentation model to find the target object in the image. Then, we get rid of the target object and inpaint the empty space with the desired prompt. By doing this, we can generate a target image corresponding to the desired prompt, while maintaining the overall integrity of the original image. Furthermore, in Contrastive-Adv, we design a novel loss function to obtain better adversarial examples. Our extensive benchmark results demonstrate that Replace-then-Perturb and Contrastive-Adv outperform the baseline adversarial attack algorithms. We note that the source code to reproduce the results will be available.

Paper Structure

This paper contains 23 sections, 18 equations, 5 figures, 7 tables, 2 algorithms.

Figures (5)

  • Figure 1: An example comparing embedding-based adversarial attacks and the proposed method. The original image depicts books, alphabet blocks, pencils, a drawing, and an apple. The target object is the apple in the image, and the desired prompt is "a baseball." (Left) In embedding-based adversarial attacks, the image is recognized as a baseball; however, due to a lack of visual reasoning, the VLMs provide unnatural outputs (Q2, Q3). (Right) In the proposed method, incorporating visual reasoning, the VLMs generate natural outputs, correctly replacing the target object (apple) with a (baseball).
  • Figure 2: An illustration of the detailed procedure of Replace-then-Perturb, where the target object is the stop sign in the original image. In this adversarial attack, we aim to change the stop sign into a 50 mph speed limit sign.
  • Figure 3: Contrastive-Adv algorithm
  • Figure 4: Graphical examples of the experimental results, where the target neural network model is LLAVA 1.5. In this figure, we depict four examples, where the target object and the target prompt are indicated below the images. Based on the results, the latent-based adversarial examples (baseline) cannot consider the visual reasoning related to the adversarial changes. In the first example, for the third question (Q3), the baseline answers that there is the head of a person in the image, as the target prompt 'helmet' makes the perturbed image deceive the target model to recognize there is a person wearing a helmet. In the second to fourth images, the adversarial examples generated by the baseline often lead VLMs to interpret the image in an unintended direction, whereas the proposed method accurately modifies the target of the adversarial perturbation.
  • Figure 5: Illustrations of the adversarial examples generated by our proposed method (Replace-then-Perturb and Contrastive-Adv, where $\epsilon=16/255$ and $T=200$. In these illustrations, we depict the original image, perturbed image, and added noise, where the added noise images are 10 times amplified for visualization.

Theorems & Definitions (1)

  • Remark 1: Why we do not apply augmentation block to the perturbation image?