Table of Contents
Fetching ...

Few-Shot Adversarial Prompt Learning on Vision-Language Models

Yiwei Zhou, Xiaobo Xia, Zhiwei Lin, Bo Han, Tongliang Liu

TL;DR

A few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement and a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples.

Abstract

The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Inspired by the success of vision-language foundation models, previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. However, in practice, they are still unsatisfactory due to several issues, including heavy adaptation cost, suboptimal text supervision, and uncontrolled natural generalization capacity. In this paper, to address these issues, we propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement. Specifically, we achieve this by providing adversarially correlated text supervision that is end-to-end learned from adversarial examples. We also propose a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples. The proposed framework gives access to learn adversarial text supervision, which provides superior cross-modal adversarial alignment and matches state-of-the-art zero-shot adversarial robustness with only 1% training data. Code is available at: https://github.com/lionel-w2/FAP.

Few-Shot Adversarial Prompt Learning on Vision-Language Models

TL;DR

A few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement and a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples.

Abstract

The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Inspired by the success of vision-language foundation models, previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. However, in practice, they are still unsatisfactory due to several issues, including heavy adaptation cost, suboptimal text supervision, and uncontrolled natural generalization capacity. In this paper, to address these issues, we propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement. Specifically, we achieve this by providing adversarially correlated text supervision that is end-to-end learned from adversarial examples. We also propose a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples. The proposed framework gives access to learn adversarial text supervision, which provides superior cross-modal adversarial alignment and matches state-of-the-art zero-shot adversarial robustness with only 1% training data. Code is available at: https://github.com/lionel-w2/FAP.
Paper Structure (33 sections, 9 equations, 9 figures, 16 tables, 2 algorithms)

This paper contains 33 sections, 9 equations, 9 figures, 16 tables, 2 algorithms.

Figures (9)

  • Figure 1: The overview of the proposed Few-shot Adversarial Prompt learning (FAP) framework. Note that only prompt tokens as well as the deep projections from image to text are tuned while the rest of the model is frozen. Our method promotes a consistent cross-modal similarity distribution between natural and adversarial examples, while encouraging differences in uni-modal representations. The adversarial-aware text supervision learned in this manner can better align adversarial features and establish robust decision boundaries with a limited number of examples. The natural and adversarial forward processes of the image encoder share parameters.
  • Figure 2: Visualization of the natural image embedding, adversarial image embedding, and text embedding after tuning with and without the adversarial-aware term. Images are sampled from the same class in the Caltech101 dataset fei2004learning.
  • Figure 3: Accuracy (%) of adversarial few-shot learning on 11 datasets. The dots represent the result of each experiment and lines reveal the trend of the average results from three trials under each setting with respect to the shot numbers. In each subfigure, we report the natural accuracy (dashed line) in the upper half, and the robust accuracy (solid line) in the lower half. Statistical results of standard deviations across multiple trials are included in Appendix \ref{['Detailed Results for Adversarial Few-shot Learning']}.
  • Figure 4: Instability analysis for DTD, OxfordPets, and Caltech101. We report the model performance (%) w.r.t the ratio ($\lambda$) between natural and robust terms in training objectives. The results of deep prompt interaction from text to image are plotted in red line, while that from image to text are plotted in blue line.
  • Figure 5: Training loss curve under both stable and unstable settings. We report the total, natural, and robust loss during the whole training stage.
  • ...and 4 more figures