Table of Contents
Fetching ...

ProTransformer: Robustify Transformers via Plug-and-Play Paradigm

Zhichao Hou, Weizhi Gao, Yuchen Shen, Feiyi Wang, Xiaorui Liu

TL;DR

This paper introduces a novel robust attention mechanism designed to enhance the resilience of transformer-based architectures that can be integrated into existing transformers as a plug-and-play layer, improving their robustness without the need for additional training or fine-tuning.

Abstract

Transformer-based architectures have dominated various areas of machine learning in recent years. In this paper, we introduce a novel robust attention mechanism designed to enhance the resilience of transformer-based architectures. Crucially, this technique can be integrated into existing transformers as a plug-and-play layer, improving their robustness without the need for additional training or fine-tuning. Through comprehensive experiments and ablation studies, we demonstrate that our ProTransformer significantly enhances the robustness of transformer models across a variety of prediction tasks, attack mechanisms, backbone architectures, and data domains. Notably, without further fine-tuning, the ProTransformer consistently improves the performance of vanilla transformers by 19.5%, 28.3%, 16.1%, and 11.4% for BERT, ALBERT, DistilBERT, and RoBERTa, respectively, under the classical TextFooler attack. Furthermore, ProTransformer shows promising resilience in large language models (LLMs) against prompting-based attacks, improving the performance of T5 and LLaMA by 24.8% and 17.8%, respectively, and enhancing Vicuna by an average of 10.4% against the Jailbreaking attack. Beyond the language domain, ProTransformer also demonstrates outstanding robustness in both vision and graph domains.

ProTransformer: Robustify Transformers via Plug-and-Play Paradigm

TL;DR

This paper introduces a novel robust attention mechanism designed to enhance the resilience of transformer-based architectures that can be integrated into existing transformers as a plug-and-play layer, improving their robustness without the need for additional training or fine-tuning.

Abstract

Transformer-based architectures have dominated various areas of machine learning in recent years. In this paper, we introduce a novel robust attention mechanism designed to enhance the resilience of transformer-based architectures. Crucially, this technique can be integrated into existing transformers as a plug-and-play layer, improving their robustness without the need for additional training or fine-tuning. Through comprehensive experiments and ablation studies, we demonstrate that our ProTransformer significantly enhances the robustness of transformer models across a variety of prediction tasks, attack mechanisms, backbone architectures, and data domains. Notably, without further fine-tuning, the ProTransformer consistently improves the performance of vanilla transformers by 19.5%, 28.3%, 16.1%, and 11.4% for BERT, ALBERT, DistilBERT, and RoBERTa, respectively, under the classical TextFooler attack. Furthermore, ProTransformer shows promising resilience in large language models (LLMs) against prompting-based attacks, improving the performance of T5 and LLaMA by 24.8% and 17.8%, respectively, and enhancing Vicuna by an average of 10.4% against the Jailbreaking attack. Beyond the language domain, ProTransformer also demonstrates outstanding robustness in both vision and graph domains.

Paper Structure

This paper contains 66 sections, 2 theorems, 16 equations, 24 figures, 30 tables, 2 algorithms.

Key Result

Lemma 3.1

Suppose the loss objective is defined as in Eq. eq:robust_regression, where $\rho \circ \text{sqrt}(\cdot)$ is any non-convex function. For any fixed point ${\mathbf{z}}^{(k)}$, there exists a convex localized upper bound as: where $w_j^{(k)}=\frac{\rho^\prime(\|{\mathbf{v}}_j-{\mathbf{z}}^{(k)}\|)}{2\|{\mathbf{v}}_j-{\mathbf{z}}^{(k)}\|}$ and $\rho^\prime$ is the first derivative of $\rho$. Part

Figures (24)

  • Figure 1: Various attack mechanisms on language models. Classic text attacks modify the input content using typos or synonyms; Prompt attacks perturb the prompt template within the input; and Jailbreaks append adversarial, non-semantic suffixes to manipulate the model into producing malicious outputs.
  • Figure 2: Overview of ProTransformer. ProAttention can be plugged into pretrained transformers without additional training. The ProTransformer is versatile and can be applied across various domains, including language, image, and graph.
  • Figure 3: Different $\rho(z)$.
  • Figure 4: Ablation studies.
  • Figure 5: Prompt attack results.
  • ...and 19 more figures

Theorems & Definitions (6)

  • Lemma 3.1: Localized Upper Bound
  • proof
  • Theorem 3.2: Convergence guarantee
  • proof
  • proof
  • proof