Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

Chengzhi Mao; Scott Geng; Junfeng Yang; Xin Wang; Carl Vondrick

Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, Carl Vondrick

TL;DR

This work tackles zero-shot adversarial robustness for vision-language foundation models, focusing on how adaptation strategy and training objectives shape robustness to unseen tasks. It introduces TeCoA, a text-guided cross-modal contrastive loss that aligns adversarial visual features with correct text embeddings, and demonstrates its effectiveness with both finetuning and visual prompting. Across 16 datasets, including ImageNet, TeCoA substantially improves zero-shot robustness (average gains ~31 points over CLIP) and remains effective with unlabeled data via pseudo-labels. The results provide practical guidance for preserving zero-shot generalization while boosting robustness and establish a benchmark for future work in zero-shot adversarial robustness.

Abstract

Pretrained large-scale vision-language models like CLIP have exhibited strong generalization over unseen tasks. Yet imperceptible adversarial perturbations can significantly reduce CLIP's performance on new tasks. In this work, we identify and explore the problem of \emph{adapting large-scale models for zero-shot adversarial robustness}. We first identify two key factors during model adaption -- training losses and adaptation methods -- that affect the model's zero-shot adversarial robustness. We then propose a text-guided contrastive adversarial training loss, which aligns the text embeddings and the adversarial visual features with contrastive learning on a small set of training data. We apply this training loss to two adaption methods, model finetuning and visual prompt tuning. We find that visual prompt tuning is more effective in the absence of texts, while finetuning wins in the existence of text guidance. Overall, our approach significantly improves the zero-shot adversarial robustness over CLIP, seeing an average improvement of over 31 points over ImageNet and 15 zero-shot datasets. We hope this work can shed light on understanding the zero-shot adversarial robustness of large-scale models.

Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

TL;DR

Abstract

Paper Structure (24 sections, 10 equations, 7 figures, 5 tables, 5 algorithms)

This paper contains 24 sections, 10 equations, 7 figures, 5 tables, 5 algorithms.

Introduction
Related Work
Model Adaptation for Zero-Shot Adversarial Robustness
Background and Problem Setup
Adapting the Large-Scale Models
Text-Guided Adversarial Contrastive Adversarial Training
Experiments
Experimental Results
Analysis
Conclusion
Acknowledgement
Appendix
Experiments
Zero-shot Clean Accuracy of our Adapted Model
AutoAttack Experiment
...and 9 more sections

Figures (7)

Figure 1: (a, left) Despite CLIP's high performance on zero-shot image recognition tasks, it remains vulnerable when the input images are constructed adversarially. (b, right) Standard adversarial training improves robustness on the trained task (ImageNet), but comes at the expense of its zero-shot capability. Our paper studies how to adapt CLIP to achieve adversarial robustness on zero-shot tasks.
Figure 2: Adaptation methods for large-scale pretrained CLIP model for zero-shot adversarial robustness. (a) Linear probes that adapt the readout layer. (b) Partial finetuning that only updates the last few layers of the model. (c) Finetuning the whole model. (d) Adding visual prompting to the input image. (e) Appending visual prompt tokens to the input token sequence.
Figure 3: Text-Guided Contrative Adversarial Learning. Instead of using one-hot embedding based supervision, we use adverasrial contrastive learning with language supervision, which achieves better zero-shot robustness transferability.
Figure 4: Effect of training set size. We consider several setups where only a restricted number of samples from each training class are available during adversarial training with TeCoA. Training on more data in general improves zero-shot robustness, but not affect clean performance much.
Figure 5: Zero-shot adversarial robustness under different perturbation bounds ($\epsilon=1,2,4/255$). We vary the perturbation bound for adversarial finetuning with TeCoA. Each adapted model is evaluated under attacks from the same bound seen during training. We show both the robust accuracy (left) and clean accuracy (right). Our defense is still effective on zero-shot tasks when the perturbation gets larger.
...and 2 more figures

Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

TL;DR

Abstract

Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)