Intra-neuronal attention within language models Relationships between activation and semantics
Michael Pichat, William Pogrund, Paloma Pichat, Armanouche Gasparian, Samuel Demarchi, Corbet Alois Georgeon, Michael Veillet-Guillem
TL;DR
This work investigates whether perceptron-type neurons in language models exhibit intra-neuronal attention by linking activation-based token segmentation to categorical segmentation. Using GPT-2XL and a micro-explainability framework, it analyzes two complementary approaches (top-down clustering of tokens into categories and bottom-up segmentation of activation to detect homogeneous groups) across the model's first two layers. The results show a subtle but systematic relation between very high activation tokens and more homogeneous categorical segments, along with activation interleaving that challenges a strict one-to-one mapping. These findings imply a plausible mechanism for intra-neuronal conceptualization that could seed higher-order categorical abstractions in subsequent layers, while also highlighting the limitations of observing such phenomena through embedding-based semantic proxies.
Abstract
This study investigates the ability of perceptron-type neurons in language models to perform intra-neuronal attention; that is, to identify different homogeneous categorical segments within the synthetic thought category they encode, based on a segmentation of specific activation zones for the tokens to which they are particularly responsive. The objective of this work is therefore to determine to what extent formal neurons can establish a homomorphic relationship between activation-based and categorical segmentations. The results suggest the existence of such a relationship, albeit tenuous, only at the level of tokens with very high activation levels. This intra-neuronal attention subsequently enables categorical restructuring processes at the level of neurons in the following layer, thereby contributing to the progressive formation of high-level categorical abstractions.
