Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon
TL;DR
PDCL-Attack leverages CLIP's joint image-text space and prompt learning to train a perturbation generator that produces transferable adversarial perturbations. The method uses a three-phase pipeline: Phase 1 learns a Prompter to generate robust text features; Phase 2 trains a perturbation generator with a prompt-driven contrastive loss $\mathcal{L}_{\mathrm{PDCL}} = \|\boldsymbol{\phi}'_s - \boldsymbol{\tau}'_s\|_2^2 + \max(0, \alpha - \|\boldsymbol{\phi}'_s - \boldsymbol{\tau}_s\|_2)^2$ and an image-based surrogate loss $\mathcal{L}_{\mathrm{surr}}$; Phase 3 freezes the generator for inference on unseen domains and models. Extensive cross-domain and cross-model experiments on ImageNet-1K show the approach surpasses prior generative attacks, with gains amplified by using learned prompts and CLIP-derived text guidance. The work highlights the risk posed by multimodal foundation models in adversarial contexts and motivates developing robust defenses against such transfer attacks.
Abstract
Recent vision-language foundation models, such as CLIP, have demonstrated superior capabilities in learning representations that can be transferable across diverse range of downstream tasks and domains. With the emergence of such powerful models, it has become crucial to effectively leverage their capabilities in tackling challenging vision tasks. On the other hand, only a few works have focused on devising adversarial examples that transfer well to both unknown domains and model architectures. In this paper, we propose a novel transfer attack method called PDCL-Attack, which leverages the CLIP model to enhance the transferability of adversarial perturbations generated by a generative model-based attack framework. Specifically, we formulate an effective prompt-driven feature guidance by harnessing the semantic representation power of text, particularly from the ground-truth class labels of input images. To the best of our knowledge, we are the first to introduce prompt learning to enhance the transferable generative attacks. Extensive experiments conducted across various cross-domain and cross-model settings empirically validate our approach, demonstrating its superiority over state-of-the-art methods.
