Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability

Xufeng Duan; Xinyu Zhou; Bei Xiao; Zhenguang G. Cai

Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability

Xufeng Duan, Xinyu Zhou, Bei Xiao, Zhenguang G. Cai

TL;DR

This study probes neuron-level language competence in GPT-2-XL by applying psycholinguistic tasks (sound-shape, sound-gender, implicit causality) and using accumulative direct effect to identify top contributing neurons. Through targeted ablation and activation manipulation, it demonstrates causal links between specific neurons and human-like performance in the sound-gender and implicit causality tasks, while showing no such specialization for the sound-shape task. The findings suggest that certain linguistic abilities in LLMs are supported by identifiable neurons, advancing interpretability by connecting cognitive phenomena to neural substrates. However, the approach reveals limitations in tasks requiring distributed representations and raises questions about generalization to more capable, modern models.

Abstract

As large language models (LLMs) advance in their linguistic capacity, understanding how they capture aspects of language competence remains a significant challenge. This study therefore employs psycholinguistic paradigms in English, which are well-suited for probing deeper cognitive aspects of language processing, to explore neuron-level representations in language model across three tasks: sound-shape association, sound-gender association, and implicit causality. Our findings indicate that while GPT-2-XL struggles with the sound-shape task, it demonstrates human-like abilities in both sound-gender association and implicit causality. Targeted neuron ablation and activation manipulation reveal a crucial relationship: When GPT-2-XL displays a linguistic ability, specific neurons correspond to that competence; conversely, the absence of such an ability indicates a lack of specialized neurons. This study is the first to utilize psycholinguistic experiments to investigate deep language competence at the neuron level, providing a new level of granularity in model interpretability and insights into the internal mechanisms driving language ability in the transformer-based LLM.

Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability

TL;DR

Abstract

Paper Structure (20 sections, 1 equation, 3 figures)

This paper contains 20 sections, 1 equation, 3 figures.

Introduction
Related Work
Interpretability of Large Language Models
Psycholinguistics and Neural Representations
Neuron Ablation and Activation Techniques
Methodology
Experimental Setup
Sound-Shape Association Task
Sound-Gender Association Task
Implicit Causality Task
Neuron Selection Process
Neuron Ablation Procedure
Neuron Activation Enhancement
Results
Human Response Replication
...and 5 more sections

Figures (3)

Figure 1: The contribution proportion of the top 50 neurons across three different experiments. Each point represents a neuron. The top 5 are highlighted with black circular outlines.
Figure 2: Model performance of GPT-2-XL on three psycholinguistic tasks, measured by the logits difference between the target and distractor for each task.
Figure 3: Neuron manipulation results. Effect of neuron manipulation on GPT-2-XL performance, comparing ablation and double activation of the top 5 and top 50 neurons across three psycholinguistic tasks.

Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability

TL;DR

Abstract

Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model Interpretability

Authors

TL;DR

Abstract

Table of Contents

Figures (3)