Table of Contents
Fetching ...

Property Neurons in Self-Supervised Speech Transformers

Tzu-Quan Lin, Guan-Ting Lin, Hung-yi Lee, Hao Tang

TL;DR

This work identifies a set of property neurons in the feedforward layers of Transformers to study how speech-related properties, such as phones, gender, and pitch, are stored and shows that protecting property neurons during pruning is significantly more effective than normbased pruning.

Abstract

There have been many studies on analyzing self-supervised speech Transformers, in particular, with layer-wise analysis. It is, however, desirable to have an approach that can pinpoint exactly a subset of neurons that is responsible for a particular property of speech, being amenable to model pruning and model editing. In this work, we identify a set of property neurons in the feedforward layers of Transformers to study how speech-related properties, such as phones, gender, and pitch, are stored. When removing neurons of a particular property (a simple form of model editing), the respective downstream performance significantly degrades, showing the importance of the property neurons. We apply this approach to pruning the feedforward layers in Transformers, where most of the model parameters are. We show that protecting property neurons during pruning is significantly more effective than norm-based pruning. The code for identifying property neurons is available at https://github.com/nervjack2/PropertyNeurons.

Property Neurons in Self-Supervised Speech Transformers

TL;DR

This work identifies a set of property neurons in the feedforward layers of Transformers to study how speech-related properties, such as phones, gender, and pitch, are stored and shows that protecting property neurons during pruning is significantly more effective than normbased pruning.

Abstract

There have been many studies on analyzing self-supervised speech Transformers, in particular, with layer-wise analysis. It is, however, desirable to have an approach that can pinpoint exactly a subset of neurons that is responsible for a particular property of speech, being amenable to model pruning and model editing. In this work, we identify a set of property neurons in the feedforward layers of Transformers to study how speech-related properties, such as phones, gender, and pitch, are stored. When removing neurons of a particular property (a simple form of model editing), the respective downstream performance significantly degrades, showing the importance of the property neurons. We apply this approach to pruning the feedforward layers in Transformers, where most of the model parameters are. We show that protecting property neurons during pruning is significantly more effective than norm-based pruning. The code for identifying property neurons is available at https://github.com/nervjack2/PropertyNeurons.
Paper Structure (17 sections, 5 equations, 9 figures, 1 table)

This paper contains 17 sections, 5 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: The illustration of how feed-forward networks in Transformers could be regard as a type of neural memory.
  • Figure 2: The probability of neurons activated when a phone [ah] is present. The neurons are sorted according to the probability.
  • Figure 3: The results of multidimensional scaling on the activation patterns of phones conditioned on broad phone classes, gender and pitch. Different colors represent different groups. For each condition, we show the layer with the highest silhouette score rousseeuw1987silhouettes, i.e., the 8th layer, the 1st layer, and the 1st layer, respectively. We consider [r], [y], [w] and [l] as voiced consonants here.
  • Figure 4: The result of performing multidimensional scaling on the activation patterns of phones for different properties of speech. We report silhouette score to measure cluster tightness.
  • Figure 5: The silhouette score of multidimensional scaling on the activation patterns of phones for different speech models. We report the highest score among all layers for each model and each property. MelHuBERT-PR and MelHuBERT-SID denote fine-tuned MelHuBERT on phoneme recognition and speaker identification respectively.
  • ...and 4 more figures