Table of Contents
Fetching ...

Intra-neuronal attention within language models Relationships between activation and semantics

Michael Pichat, William Pogrund, Paloma Pichat, Armanouche Gasparian, Samuel Demarchi, Corbet Alois Georgeon, Michael Veillet-Guillem

TL;DR

This work investigates whether perceptron-type neurons in language models exhibit intra-neuronal attention by linking activation-based token segmentation to categorical segmentation. Using GPT-2XL and a micro-explainability framework, it analyzes two complementary approaches (top-down clustering of tokens into categories and bottom-up segmentation of activation to detect homogeneous groups) across the model's first two layers. The results show a subtle but systematic relation between very high activation tokens and more homogeneous categorical segments, along with activation interleaving that challenges a strict one-to-one mapping. These findings imply a plausible mechanism for intra-neuronal conceptualization that could seed higher-order categorical abstractions in subsequent layers, while also highlighting the limitations of observing such phenomena through embedding-based semantic proxies.

Abstract

This study investigates the ability of perceptron-type neurons in language models to perform intra-neuronal attention; that is, to identify different homogeneous categorical segments within the synthetic thought category they encode, based on a segmentation of specific activation zones for the tokens to which they are particularly responsive. The objective of this work is therefore to determine to what extent formal neurons can establish a homomorphic relationship between activation-based and categorical segmentations. The results suggest the existence of such a relationship, albeit tenuous, only at the level of tokens with very high activation levels. This intra-neuronal attention subsequently enables categorical restructuring processes at the level of neurons in the following layer, thereby contributing to the progressive formation of high-level categorical abstractions.

Intra-neuronal attention within language models Relationships between activation and semantics

TL;DR

This work investigates whether perceptron-type neurons in language models exhibit intra-neuronal attention by linking activation-based token segmentation to categorical segmentation. Using GPT-2XL and a micro-explainability framework, it analyzes two complementary approaches (top-down clustering of tokens into categories and bottom-up segmentation of activation to detect homogeneous groups) across the model's first two layers. The results show a subtle but systematic relation between very high activation tokens and more homogeneous categorical segments, along with activation interleaving that challenges a strict one-to-one mapping. These findings imply a plausible mechanism for intra-neuronal conceptualization that could seed higher-order categorical abstractions in subsequent layers, while also highlighting the limitations of observing such phenomena through embedding-based semantic proxies.

Abstract

This study investigates the ability of perceptron-type neurons in language models to perform intra-neuronal attention; that is, to identify different homogeneous categorical segments within the synthetic thought category they encode, based on a segmentation of specific activation zones for the tokens to which they are particularly responsive. The objective of this work is therefore to determine to what extent formal neurons can establish a homomorphic relationship between activation-based and categorical segmentations. The results suggest the existence of such a relationship, albeit tenuous, only at the level of tokens with very high activation levels. This intra-neuronal attention subsequently enables categorical restructuring processes at the level of neurons in the following layer, thereby contributing to the progressive formation of high-level categorical abstractions.

Paper Structure

This paper contains 23 sections, 6 equations, 9 figures, 10 tables.

Figures (9)

  • Figure :
  • Figure : Graph n°1 : Comparison of mean activations between categorical clusters from hierarchical classification on tokens' embeddings (layer 0).
  • Figure : Graph n°2 : Comparison of mean activations between categorical clusters from hierarchical classification on tokens' embeddings (layer 1).
  • Figure : Graph n°3 : Comparison of mean activations between categorical clusters from GPT4 clustering prompt (layer 0).
  • Figure : Graphe n°4 : Comparison of mean activations between categorical clusters from GPT4 clustering prompt (layer 1).
  • ...and 4 more figures