Table of Contents
Fetching ...

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

Itay Yona, Dan Barzilay, Michael Karasik, Mor Geva

Abstract

Language models can answer many entity-centric factual questions, but it remains unclear which internal mechanisms are involved in this process. We study this question across multiple language models. We localize entity-selective MLP neurons using templated prompts about each entity, and then validate them with causal interventions on PopQA-based QA examples. On a curated set of 200 entities drawn from PopQA, localized neurons concentrate in early layers. Negative ablation produces entity-specific amnesia, while controlled injection at a placeholder token improves answer retrieval relative to mean-entity and wrong-cell controls. For many entities, activating a single localized neuron is sufficient to recover entity-consistent predictions once the context is initialized, consistent with compact entity retrieval rather than purely gradual enrichment across depth. Robustness to aliases, acronyms, misspellings, and multilingual forms supports a canonicalization interpretation. The effect is strong but not universal: not every entity admits a reliable single-neuron handle, and coverage is higher for popular entities. Overall, these results identify sparse, causally actionable access points for analyzing and modulating entity-conditioned factual behavior.

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

Abstract

Language models can answer many entity-centric factual questions, but it remains unclear which internal mechanisms are involved in this process. We study this question across multiple language models. We localize entity-selective MLP neurons using templated prompts about each entity, and then validate them with causal interventions on PopQA-based QA examples. On a curated set of 200 entities drawn from PopQA, localized neurons concentrate in early layers. Negative ablation produces entity-specific amnesia, while controlled injection at a placeholder token improves answer retrieval relative to mean-entity and wrong-cell controls. For many entities, activating a single localized neuron is sufficient to recover entity-consistent predictions once the context is initialized, consistent with compact entity retrieval rather than purely gradual enrichment across depth. Robustness to aliases, acronyms, misspellings, and multilingual forms supports a canonicalization interpretation. The effect is strong but not universal: not every entity admits a reliable single-neuron handle, and coverage is higher for popular entities. Overall, these results identify sparse, causally actionable access points for analyzing and modulating entity-conditioned factual behavior.

Paper Structure

This paper contains 38 sections, 5 equations, 24 figures, 4 tables, 3 algorithms.

Figures (24)

  • Figure 1: We identify sparse, entity-selective MLP neurons, termed entity cells, that act as stable anchors for factual retrieval in Qwen2.5-7B. Concentrated primarily in early layers (0--5), these cells provide access to canonical identity representations that are robust to aliases, misspellings, and multilingual variants. These neurons serve as causally actionable access points: suppressing them induces entity-specific amnesia, while activating a single localized neuron is often sufficient to steer the model toward entity-consistent factual recall. Across the other six models in our suite, early-layer candidates also appear, though the causal validation is weaker.
  • Figure 2: Layer of the top localized cell for each PopQA-200 entity in Qwen2.5-7B base (n=200). Similar early-layer concentration is observed across other tested models; see Appendix \ref{['app:cross_model']}.
  • Figure 3: Entity-specific amnesia under negative ablation for the localized Obama cell (L2-N10941). Target (Obama) recall drops substantially as $\alpha$ decreases, while control (Trump) remains near baseline.
  • Figure 4: Controlled injection at the placeholder token X, evaluated on instances where the entity-present prompt is already correct under pass@5 (109 examples). Mean-entity initialization and wrong-cell injection are control conditions; correct-cell injection shows the expected directional gain.
  • Figure 5: Variant robustness for "Barack Obama": most spelling and phrasing perturbations keep the same localized cell (L2-N10941).
  • ...and 19 more figures