What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu; Andrew Liu; Zining Zhu; Gerald Penn

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu, Andrew Liu, Zining Zhu, Gerald Penn

TL;DR

It is found that the KN thesis does not adequately explain the process of factual expression, and it is necessary to look beyond the MLP weights and explore recent models' complex layer structures and attention mechanisms.

Abstract

We reassess the Knowledge Neuron (KN) Thesis: an interpretation of the mechanism underlying the ability of large language models to recall facts from a training corpus. This nascent thesis proposes that facts are recalled from the training corpus through the MLP weights in a manner resembling key-value memory, implying in effect that "knowledge" is stored in the network. Furthermore, by modifying the MLP modules, one can control the language model's generation of factual information. The plausibility of the KN thesis has been demonstrated by the success of KN-inspired model editing methods (Dai et al., 2022; Meng et al., 2022). We find that this thesis is, at best, an oversimplification. Not only have we found that we can edit the expression of certain linguistic phenomena using the same model editing methods but, through a more comprehensive evaluation, we have found that the KN thesis does not adequately explain the process of factual expression. While it is possible to argue that the MLP weights store complex patterns that are interpretable both syntactically and semantically, these patterns do not constitute "knowledge." To gain a more comprehensive understanding of the knowledge representation process, we must look beyond the MLP weights and explore recent models' complex layer structures and attention mechanisms.

What does the Knowledge Neuron Thesis Have to do with Knowledge?

TL;DR

Abstract

Paper Structure (35 sections, 2 equations, 13 figures, 10 tables)

This paper contains 35 sections, 2 equations, 13 figures, 10 tables.

Introduction
The Knowledge Neuron Thesis
Evaluating the KN Thesis: an Overview
Evaluating the KN Thesis on Syntactic Phenomena
Editing Syntactic Phenomena & the "Formal vs Functional" Distinction
Localising Syntactic Phenomena in Language Models
Methods: Searching for KNs of Syntactic Phenomena
Neuron Attribution Score
Measuring the Level of Localisation
Results & Findings
Finding 1: We can localise the grammatical number of determiners to just two neurons, just like factual information.
Effects of Suppressing the "Number Neuron"
Finding 2: KNs obtained using linguistic tasks and factual tasks share similar characteristics of localisation.
Finding 3: Despite the high level of localisation in the underlying probability drift, the effect of editing the KNs is not enough to overturn the categorical predictions made by the language model.
Discussion
...and 20 more sections

Figures (13)

Figure 1: Syntactic phenomena can be located and edited using existing model editing methods. The integrated gradient of singular determiner ( this, that) and plural determiner ( these, those) form two distinct groups. Erasing these neurons leads to output probability changes.
Figure 2: Localising grammatical number to KNs. The singular determiners share a common KN ($w_{2096}^{(10)}$), and the plural determiners share a different common KN ($w_{1094}^{ (9)}$).
Figure 3: Suppressing the number neuron's (singular: $w^{(10)}_{2096}$; plural: $w^{(9)}_{1094}$) effect across number-expressing prenominal modifiers. Significant ($p<0.05$) changes are highlighted in red. The three sections in the plots are, from left to right, plural, singular and neutral modifiers.
Figure 4: The localisation of plurality appeals to word co-occurrence frequencies cues.
Figure 5: The localisation of certain syntactic phenomena (BLiMP) is comparable to facts ( ParaRel). We see comparable localisation metrics and the identified KNs occupy the same layers.
...and 8 more figures

What does the Knowledge Neuron Thesis Have to do with Knowledge?

TL;DR

Abstract

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Authors

TL;DR

Abstract

Table of Contents

Figures (13)