Table of Contents
Fetching ...

Neurons in Large Language Models: Dead, N-gram, Positional

Elena Voita, Javier Ferrando, Christoforos Nalmpantis

TL;DR

This work probes the inner workings of FFN layers in OPT-based LLMs using a scalable, single-GPU approach. It reveals a striking dichotomy: a sparse, early portion of the network where many neurons are dead and where token- and n-gram-detector neurons emerge, alongside later layers that remain active and combinatorial in function; moreover, some detectors explicitly suppress current-input information rather than merely promoting next-token candidates. The study also uncovers positional neurons that encode token position independently of content, challenging the canonical key-value memory view of FFNs and showing two-stage positional dynamics across model scales. Collectively, the findings illuminate how FFN layers contribute to representation and output in ways that depend on scale and architecture, with implications for interpretability and future analyses of transformer internals.

Abstract

We analyze a family of large language models in such a lightweight manner that can be done on a single GPU. Specifically, we focus on the OPT family of models ranging from 125m to 66b parameters and rely only on whether an FFN neuron is activated or not. First, we find that the early part of the network is sparse and represents many discrete features. Here, many neurons (more than 70% in some layers of the 66b model) are "dead", i.e. they never activate on a large collection of diverse data. At the same time, many of the alive neurons are reserved for discrete features and act as token and n-gram detectors. Interestingly, their corresponding FFN updates not only promote next token candidates as could be expected, but also explicitly focus on removing the information about triggering them tokens, i.e., current input. To the best of our knowledge, this is the first example of mechanisms specialized at removing (rather than adding) information from the residual stream. With scale, models become more sparse in a sense that they have more dead neurons and token detectors. Finally, some neurons are positional: them being activated or not depends largely (or solely) on position and less so (or not at all) on textual data. We find that smaller models have sets of neurons acting as position range indicators while larger models operate in a less explicit manner.

Neurons in Large Language Models: Dead, N-gram, Positional

TL;DR

This work probes the inner workings of FFN layers in OPT-based LLMs using a scalable, single-GPU approach. It reveals a striking dichotomy: a sparse, early portion of the network where many neurons are dead and where token- and n-gram-detector neurons emerge, alongside later layers that remain active and combinatorial in function; moreover, some detectors explicitly suppress current-input information rather than merely promoting next-token candidates. The study also uncovers positional neurons that encode token position independently of content, challenging the canonical key-value memory view of FFNs and showing two-stage positional dynamics across model scales. Collectively, the findings illuminate how FFN layers contribute to representation and output in ways that depend on scale and architecture, with implications for interpretability and future analyses of transformer internals.

Abstract

We analyze a family of large language models in such a lightweight manner that can be done on a single GPU. Specifically, we focus on the OPT family of models ranging from 125m to 66b parameters and rely only on whether an FFN neuron is activated or not. First, we find that the early part of the network is sparse and represents many discrete features. Here, many neurons (more than 70% in some layers of the 66b model) are "dead", i.e. they never activate on a large collection of diverse data. At the same time, many of the alive neurons are reserved for discrete features and act as token and n-gram detectors. Interestingly, their corresponding FFN updates not only promote next token candidates as could be expected, but also explicitly focus on removing the information about triggering them tokens, i.e., current input. To the best of our knowledge, this is the first example of mechanisms specialized at removing (rather than adding) information from the residual stream. With scale, models become more sparse in a sense that they have more dead neurons and token detectors. Finally, some neurons are positional: them being activated or not depends largely (or solely) on position and less so (or not at all) on textual data. We find that smaller models have sets of neurons acting as position range indicators while larger models operate in a less explicit manner.
Paper Structure (48 sections, 7 equations, 16 figures)

This paper contains 48 sections, 7 equations, 16 figures.

Figures (16)

  • Figure 1: (a) Percentage of "dead" neurons; (b) average neuron activation frequency among non-dead neurons.
  • Figure 2: Neurons categorized by the number of unigrams (i.e., tokens) able to trigger them. First half of the network, alive neurons only.
  • Figure 3: (a) Number of token-detecting neurons; (b) number of tokens that have a detecting them neuron: solid line -- per layer, dashed -- cumulative over layers.
  • Figure 4: Number of tokens covered in each layer with indicated (i) new overall, and (ii) new compared to the previous layer tokens.
  • Figure 5: Examples of the top promoted and suppressed tokens for token-detecting neurons (ƒ† is a special symbol denoting the space before word -- in the OPT tokenizers, it is part of a word); OPT-66b model.
  • ...and 11 more figures