Table of Contents
Fetching ...

On the Relationship between Skill Neurons and Robustness in Prompt Tuning

Leon Ackermann, Xenia Ohmer

TL;DR

Higher adversarial robustness may be related to a model’s ability to consistently activate the relevant skill neurons on adversarial data.

Abstract

Prompt Tuning is a popular parameter-efficient finetuning method for pre-trained large language models (PLMs). Based on experiments with RoBERTa, it has been suggested that Prompt Tuning activates specific neurons in the transformer's feed-forward networks, that are highly predictive and selective for the given task. In this paper, we study the robustness of Prompt Tuning in relation to these "skill neurons", using RoBERTa and T5. We show that prompts tuned for a specific task are transferable to tasks of the same type but are not very robust to adversarial data. While prompts tuned for RoBERTa yield below-chance performance on adversarial data, prompts tuned for T5 are slightly more robust and retain above-chance performance in two out of three cases. At the same time, we replicate the finding that skill neurons exist in RoBERTa and further show that skill neurons also exist in T5. Interestingly, the skill neurons of T5 determined on non-adversarial data are also among the most predictive neurons on the adversarial data, which is not the case for RoBERTa. We conclude that higher adversarial robustness may be related to a model's ability to consistently activate the relevant skill neurons on adversarial data.

On the Relationship between Skill Neurons and Robustness in Prompt Tuning

TL;DR

Higher adversarial robustness may be related to a model’s ability to consistently activate the relevant skill neurons on adversarial data.

Abstract

Prompt Tuning is a popular parameter-efficient finetuning method for pre-trained large language models (PLMs). Based on experiments with RoBERTa, it has been suggested that Prompt Tuning activates specific neurons in the transformer's feed-forward networks, that are highly predictive and selective for the given task. In this paper, we study the robustness of Prompt Tuning in relation to these "skill neurons", using RoBERTa and T5. We show that prompts tuned for a specific task are transferable to tasks of the same type but are not very robust to adversarial data. While prompts tuned for RoBERTa yield below-chance performance on adversarial data, prompts tuned for T5 are slightly more robust and retain above-chance performance in two out of three cases. At the same time, we replicate the finding that skill neurons exist in RoBERTa and further show that skill neurons also exist in T5. Interestingly, the skill neurons of T5 determined on non-adversarial data are also among the most predictive neurons on the adversarial data, which is not the case for RoBERTa. We conclude that higher adversarial robustness may be related to a model's ability to consistently activate the relevant skill neurons on adversarial data.
Paper Structure (35 sections, 5 equations, 10 figures, 1 table)

This paper contains 35 sections, 5 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Prompt transferability. We calculate the accuracy when using the prompt for the source task on the target task divided by the accuracy when using the prompt for the target task on the target task for each seed, and report the average across seeds.
  • Figure 2: Distribution of neuron predictivities (box plots) on top of model accuracy (bar plots).
  • Figure 3: Spearman rank correlation between the neuron predictivities for different datasets.
  • Figure 4: Model accuracies on IMDB when suppressing skill neurons (solid lines) versus randomly selected neurons (dashed lines).
  • Figure 5: Model accuracies on each adversarial dataset when suppressing the skill neurons determined for these tasks (solid lines) and when suppressing randomly selected neurons (dashed lines).
  • ...and 5 more figures