Table of Contents
Fetching ...

Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targetted Object Removal from Images

Arka Daw, Megan Hong-Thanh Chung, Maria Mahbub, Amir Sadovnik

TL;DR

Hiding-in-Plain-Sight (HiPS) attacks are introduced, a novel class of adversarial attacks that subtly modifies model predictions by selectively concealing target object(s), as if the target object was absent from the scene.

Abstract

Machine learning models are known to be vulnerable to adversarial attacks, but traditional attacks have mostly focused on single-modalities. With the rise of large multi-modal models (LMMs) like CLIP, which combine vision and language capabilities, new vulnerabilities have emerged. However, prior work in multimodal targeted attacks aim to completely change the model's output to what the adversary wants. In many realistic scenarios, an adversary might seek to make only subtle modifications to the output, so that the changes go unnoticed by downstream models or even by humans. We introduce Hiding-in-Plain-Sight (HiPS) attacks, a novel class of adversarial attacks that subtly modifies model predictions by selectively concealing target object(s), as if the target object was absent from the scene. We propose two HiPS attack variants, HiPS-cls and HiPS-cap, and demonstrate their effectiveness in transferring to downstream image captioning models, such as CLIP-Cap, for targeted object removal from image captions.

Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targetted Object Removal from Images

TL;DR

Hiding-in-Plain-Sight (HiPS) attacks are introduced, a novel class of adversarial attacks that subtly modifies model predictions by selectively concealing target object(s), as if the target object was absent from the scene.

Abstract

Machine learning models are known to be vulnerable to adversarial attacks, but traditional attacks have mostly focused on single-modalities. With the rise of large multi-modal models (LMMs) like CLIP, which combine vision and language capabilities, new vulnerabilities have emerged. However, prior work in multimodal targeted attacks aim to completely change the model's output to what the adversary wants. In many realistic scenarios, an adversary might seek to make only subtle modifications to the output, so that the changes go unnoticed by downstream models or even by humans. We introduce Hiding-in-Plain-Sight (HiPS) attacks, a novel class of adversarial attacks that subtly modifies model predictions by selectively concealing target object(s), as if the target object was absent from the scene. We propose two HiPS attack variants, HiPS-cls and HiPS-cap, and demonstrate their effectiveness in transferring to downstream image captioning models, such as CLIP-Cap, for targeted object removal from image captions.

Paper Structure

This paper contains 12 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: A schematic illustration of the Hiding-in-Plain-Sight (HiPS-cap) Attack.
  • Figure 2: Qualitative Results comparing various methods with target shown as red words of caption.
  • Figure 3: Comparing the effect of attack budget $\epsilon$ on the different attack success metrics for HiPS-cls and HiPS-cap attacks using FGSM and PGD with $L_\infty$ norm.
  • Figure 4: Comparing the sensitivity of hyperparameter $\lambda_1$ on HiPS-cls and HiPS-cap attacks.
  • Figure 5: Comparing the effect of attack budget $\epsilon$ on the different image quality metrics for HiPS-cls and HiPS-cap attacks using FGSM and PGD with $L_\infty$ norm.
  • ...and 2 more figures