Table of Contents
Fetching ...

On the Analogy between Human Brain and LLMs: Spotting Key Neurons in Grammar Perception

Sanaz Saki Norouzi, Mohammad Masjedi, Pascal Hitzler

TL;DR

This work investigates whether grammar perception in LLMs aligns with human brain organization by tracing next-word POS predictions to neuron activations. It combines Integrated Gradients attribution, chi-squared statistical filtering, and concept-activation analysis (CAV/CAR) to identify and validate a POS-concept subspace dispersed across Llama-3's 32 layers, culminating in a classifier that predicts POS tags from key neuron activations with strong performance (e.g., up to $0.91$ accuracy in the final layer). The study provides empirical evidence that a small, targeted set of neurons encodes grammatical concepts and that disrupting these neurons markedly degrades POS predictions, supporting a brain-like mix of localized and distributed representations in LLMs. These results offer a path toward more interpretable and controllable language models and deepen the connection between cognitive neuroscience and NLP model analysis.

Abstract

Artificial Neural Networks, the building blocks of AI, were inspired by the human brain's network of neurons. Over the years, these networks have evolved to replicate the complex capabilities of the brain, allowing them to handle tasks such as image and language processing. In the realm of Large Language Models, there has been a keen interest in making the language learning process more akin to that of humans. While neuroscientific research has shown that different grammatical categories are processed by different neurons in the brain, we show that LLMs operate in a similar way. Utilizing Llama 3, we identify the most important neurons associated with the prediction of words belonging to different part-of-speech tags. Using the achieved knowledge, we train a classifier on a dataset, which shows that the activation patterns of these key neurons can reliably predict part-of-speech tags on fresh data. The results suggest the presence of a subspace in LLMs focused on capturing part-of-speech tag concepts, resembling patterns observed in lesion studies of the brain in neuroscience.

On the Analogy between Human Brain and LLMs: Spotting Key Neurons in Grammar Perception

TL;DR

This work investigates whether grammar perception in LLMs aligns with human brain organization by tracing next-word POS predictions to neuron activations. It combines Integrated Gradients attribution, chi-squared statistical filtering, and concept-activation analysis (CAV/CAR) to identify and validate a POS-concept subspace dispersed across Llama-3's 32 layers, culminating in a classifier that predicts POS tags from key neuron activations with strong performance (e.g., up to accuracy in the final layer). The study provides empirical evidence that a small, targeted set of neurons encodes grammatical concepts and that disrupting these neurons markedly degrades POS predictions, supporting a brain-like mix of localized and distributed representations in LLMs. These results offer a path toward more interpretable and controllable language models and deepen the connection between cognitive neuroscience and NLP model analysis.

Abstract

Artificial Neural Networks, the building blocks of AI, were inspired by the human brain's network of neurons. Over the years, these networks have evolved to replicate the complex capabilities of the brain, allowing them to handle tasks such as image and language processing. In the realm of Large Language Models, there has been a keen interest in making the language learning process more akin to that of humans. While neuroscientific research has shown that different grammatical categories are processed by different neurons in the brain, we show that LLMs operate in a similar way. Utilizing Llama 3, we identify the most important neurons associated with the prediction of words belonging to different part-of-speech tags. Using the achieved knowledge, we train a classifier on a dataset, which shows that the activation patterns of these key neurons can reliably predict part-of-speech tags on fresh data. The results suggest the presence of a subspace in LLMs focused on capturing part-of-speech tag concepts, resembling patterns observed in lesion studies of the brain in neuroscience.

Paper Structure

This paper contains 14 sections, 7 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: A schematic view of the method: (a) Finding key neurons for each category -note the colors (b) Training a classifier for concept analysis
  • Figure 2: Overlap between selected neurons (p-value $<$ 0.05) for Layer 20 of Llama-3