Table of Contents
Fetching ...

ADAPT to Robustify Prompt Tuning Vision Transformers

Masih Eskandar, Tooba Imtiaz, Zifeng Wang, Jennifer Dy

TL;DR

This work addresses the vulnerability of Vision Transformers to adversarial attacks and the inefficiency of full-model defenses. It shows that standard adversarial defenses fail under prompt-tuning due to gradient obfuscation and introduces ADAPT, an adaptive adversarial training framework that conditions on prompts to robustify prompts while keeping the backbone frozen. By employing adaptive attacks and two adversarial loss variants (Cross Entropy and KL Divergence), ADAPT achieves robustness competitive with full fine-tuning while only tuning about 1% of parameters, with strong results on CIFAR-10/100 and Imagenette including black-box threats. The approach demonstrates that prompt-tuning can be made robust and scalable for downstream tasks, offering practical benefits for memory-efficient deployment of large vision models.

Abstract

The performance of deep models, including Vision Transformers, is known to be vulnerable to adversarial attacks. Many existing defenses against these attacks, such as adversarial training, rely on full-model fine-tuning to induce robustness in the models. These defenses require storing a copy of the entire model, that can have billions of parameters, for each task. At the same time, parameter-efficient prompt tuning is used to adapt large transformer-based models to downstream tasks without the need to save large copies. In this paper, we examine parameter-efficient prompt tuning of Vision Transformers for downstream tasks under the lens of robustness. We show that previous adversarial defense methods, when applied to the prompt tuning paradigm, suffer from gradient obfuscation and are vulnerable to adaptive attacks. We introduce ADAPT, a novel framework for performing adaptive adversarial training in the prompt tuning paradigm. Our method achieves competitive robust accuracy of ~40% w.r.t. SOTA robustness methods using full-model fine-tuning, by tuning only ~1% of the number of parameters.

ADAPT to Robustify Prompt Tuning Vision Transformers

TL;DR

This work addresses the vulnerability of Vision Transformers to adversarial attacks and the inefficiency of full-model defenses. It shows that standard adversarial defenses fail under prompt-tuning due to gradient obfuscation and introduces ADAPT, an adaptive adversarial training framework that conditions on prompts to robustify prompts while keeping the backbone frozen. By employing adaptive attacks and two adversarial loss variants (Cross Entropy and KL Divergence), ADAPT achieves robustness competitive with full fine-tuning while only tuning about 1% of parameters, with strong results on CIFAR-10/100 and Imagenette including black-box threats. The approach demonstrates that prompt-tuning can be made robust and scalable for downstream tasks, offering practical benefits for memory-efficient deployment of large vision models.

Abstract

The performance of deep models, including Vision Transformers, is known to be vulnerable to adversarial attacks. Many existing defenses against these attacks, such as adversarial training, rely on full-model fine-tuning to induce robustness in the models. These defenses require storing a copy of the entire model, that can have billions of parameters, for each task. At the same time, parameter-efficient prompt tuning is used to adapt large transformer-based models to downstream tasks without the need to save large copies. In this paper, we examine parameter-efficient prompt tuning of Vision Transformers for downstream tasks under the lens of robustness. We show that previous adversarial defense methods, when applied to the prompt tuning paradigm, suffer from gradient obfuscation and are vulnerable to adaptive attacks. We introduce ADAPT, a novel framework for performing adaptive adversarial training in the prompt tuning paradigm. Our method achieves competitive robust accuracy of ~40% w.r.t. SOTA robustness methods using full-model fine-tuning, by tuning only ~1% of the number of parameters.
Paper Structure (34 sections, 13 equations, 1 figure, 10 tables, 1 algorithm)

This paper contains 34 sections, 13 equations, 1 figure, 10 tables, 1 algorithm.

Figures (1)

  • Figure 1: Existing methods do not exhibit robustness to adaptive attacks. Comparison of the cross entropy loss of a random batch along two adversarial perturbation directions. Left (a) depicts the loss values of a prompt trained with traditional adversarial training and right (b) shows the loss values of a prompt trained with $\text{ADAPT}_{CE}$. Adversarial Training + Prompt Tuning does not exhibit a significant increase in the loss under the traditional PGD direction, but shows significant vulnerability to adaptive attacks. In contrast, the proposed method $\text{ADAPT}_{CE}$, exhibits robustness to both perturbation directions.