Finding Skill Neurons in Pre-trained Transformer-based Language Models

Xiaozhi Wang; Kaiyue Wen; Zhengyan Zhang; Lei Hou; Zhiyuan Liu; Juanzi Li

Finding Skill Neurons in Pre-trained Transformer-based Language Models

Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li

TL;DR

The paper investigates how task-solving skills distribute across parameters in Transformer-based pre-trained language models by identifying skill neurons—neurons whose soft-prompt activations strongly predict task labels after prompt tuning. Using RoBERTa Base and seven NLP tasks, the authors show that skill neurons generalize across tasks, emerge stably over random trials, and are crucial for performance when perturbed. They demonstrate that these neurons are largely generated during pre-training, not created by prompt tuning, and are not simply tied to word-level selectivity. Practical implications include effective network pruning and improved cross-task transferability indicators, suggesting that skill neurons offer a robust lens into the working mechanisms of PLMs and potential efficiency gains.

Abstract

Transformer-based pre-trained language models have demonstrated superior performance on various natural language processing tasks. However, it remains unclear how the skills required to handle these tasks distribute among model parameters. In this paper, we find that after prompt tuning for specific tasks, the activations of some neurons within pre-trained Transformers are highly predictive of the task labels. We dub these neurons skill neurons and confirm they encode task-specific skills by finding that: (1) Skill neurons are crucial for handling tasks. Performances of pre-trained Transformers on a task significantly drop when corresponding skill neurons are perturbed. (2) Skill neurons are task-specific. Similar tasks tend to have similar distributions of skill neurons. Furthermore, we demonstrate the skill neurons are most likely generated in pre-training rather than fine-tuning by showing that the skill neurons found with prompt tuning are also crucial for other fine-tuning methods freezing neuron weights, such as the adapter-based tuning and BitFit. We also explore the applications of skill neurons, including accelerating Transformers with network pruning and building better transferability indicators. These findings may promote further research on understanding Transformers. The source code can be obtained from https://github.com/THU-KEG/Skill-Neuron.

Finding Skill Neurons in Pre-trained Transformer-based Language Models

TL;DR

Abstract

Paper Structure (40 sections, 6 equations, 15 figures, 6 tables)

This paper contains 40 sections, 6 equations, 15 figures, 6 tables.

Introduction
Preliminary
Prompt Tuning
Neurons in Transformers
Investigation Setup
Finding Skill Neurons
Binary Classification Task
Multi-class Classification Task
Do Skill Neurons Encode Skills?
Skill Neurons Generally and Stably Emerge
Generality.
Stability.
Skill Neurons are Crucial for Handling Tasks
Skill Neurons are Task-specific
Skill Neurons are not from Word Selectivity
...and 25 more sections

Figures (15)

Figure 1: Histogram of activation of a neuron within RoBERTa$_{\textsc{Base}}$ on positive-label (blue) and negative-label (orange) sentences in SST-2 validation set.
Figure 2: Distribution of activations of two neurons on a soft prompt for samples in MNLI validation set. Dashed lines indicate baseline activations of the two neurons.
Figure 3: Histogram of neuron's predictivity for IMDB. Error bars indicate $\pm 1$ s.e.m. over $5$ random trials.
Figure 4: Accuracy on Tweet drops along with the neuron perturbation rate. Error bars indicate $\pm 1$ s.e.m. over $5$ random trials. The perturbations are conducted in descending orders of neurons' predictivities for different tasks or in random order (the "Random" curve).
Figure 5: Spearman's rank correlations between the neuron predictivity orders of different tasks. Results are averaged over all the layers.
...and 10 more figures

Finding Skill Neurons in Pre-trained Transformer-based Language Models

TL;DR

Abstract

Finding Skill Neurons in Pre-trained Transformer-based Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (15)