Table of Contents
Fetching ...

A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model

Sampriti Soor, Alik Pramanick, Jothiprakash K, Arijit Sur

TL;DR

The paper addresses adversarial vulnerabilities in vision-language models by proposing a generative attack guided by CLIP that produces highly imperceptible perturbations, especially in multi-object scenes. By integrating SSAE’s concentration strategy with CLIP-based contrastive objectives inspired by GAMA, the method achieves competitive deception of victim classifiers while maintaining high structural similarity to originals. Evaluations on CIFAR‑10, Imagenette, and Pascal VOC across multiple surrogate/target models demonstrate robust transferability and superior visual fidelity relative to prior methods. This approach advances practical adversarial attack design with strong stealth characteristics and broad applicability, including potential extensions to detection and segmentation tasks.

Abstract

The rapid growth of deep learning has brought about powerful models that can handle various tasks, like identifying images and understanding language. However, adversarial attacks, an unnoticed alteration, can deceive models, leading to inaccurate predictions. In this paper, a generative adversarial attack method is proposed that uses the CLIP model to create highly effective and visually imperceptible adversarial perturbations. The CLIP model's ability to align text and image representation helps incorporate natural language semantics with a guided loss to generate effective adversarial examples that look identical to the original inputs. This integration allows extensive scene manipulation, creating perturbations in multi-object environments specifically designed to deceive multilabel classifiers. Our approach integrates the concentrated perturbation strategy from Saliency-based Auto-Encoder (SSAE) with the dissimilar text embeddings similar to Generative Adversarial Multi-Object Scene Attacks (GAMA), resulting in perturbations that both deceive classification models and maintain high structural similarity to the original images. The model was tested on various tasks across diverse black-box victim models. The experimental results show that our method performs competitively, achieving comparable or superior results to existing techniques, while preserving greater visual fidelity.

A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model

TL;DR

The paper addresses adversarial vulnerabilities in vision-language models by proposing a generative attack guided by CLIP that produces highly imperceptible perturbations, especially in multi-object scenes. By integrating SSAE’s concentration strategy with CLIP-based contrastive objectives inspired by GAMA, the method achieves competitive deception of victim classifiers while maintaining high structural similarity to originals. Evaluations on CIFAR‑10, Imagenette, and Pascal VOC across multiple surrogate/target models demonstrate robust transferability and superior visual fidelity relative to prior methods. This approach advances practical adversarial attack design with strong stealth characteristics and broad applicability, including potential extensions to detection and segmentation tasks.

Abstract

The rapid growth of deep learning has brought about powerful models that can handle various tasks, like identifying images and understanding language. However, adversarial attacks, an unnoticed alteration, can deceive models, leading to inaccurate predictions. In this paper, a generative adversarial attack method is proposed that uses the CLIP model to create highly effective and visually imperceptible adversarial perturbations. The CLIP model's ability to align text and image representation helps incorporate natural language semantics with a guided loss to generate effective adversarial examples that look identical to the original inputs. This integration allows extensive scene manipulation, creating perturbations in multi-object environments specifically designed to deceive multilabel classifiers. Our approach integrates the concentrated perturbation strategy from Saliency-based Auto-Encoder (SSAE) with the dissimilar text embeddings similar to Generative Adversarial Multi-Object Scene Attacks (GAMA), resulting in perturbations that both deceive classification models and maintain high structural similarity to the original images. The model was tested on various tasks across diverse black-box victim models. The experimental results show that our method performs competitively, achieving comparable or superior results to existing techniques, while preserving greater visual fidelity.

Paper Structure

This paper contains 13 sections, 7 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Architecture of the proposed adversarial attack model. Frobenius loss ensures concentrated perturbation, norm loss minimizes pixel-wise differences, and contrastive loss, with CLIP embeddings, ensures feature dissimilarity between raw and perturbed image.
  • Figure 2: Examples from the CIFAR-10 dataset demonstrate that the proposed method maintains a high structural similarity between the perturbed images and the raw images, compared to the perturbed images generated by GAMA.
  • Figure 3: Qualitative examples illustrate a comparison between clean images (top row) and their corresponding perturbed images (bottom row) generated by the proposed method with samples taken from Pascal-VOC dataset.