Table of Contents
Fetching ...

Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts

Lihu Chen, Adam Dejl, Francesca Toni

TL;DR

This work introduces QRNCA, a scalable, architecture-agnostic framework for identifying query-relevant neurons in decoder-only LLMs during long-form text generation by recasting prompts as multi-choice QA. Through neuron attribution, inverse cluster attribution, and common-neuron filtering, QRNCA isolates QR neurons that influence specific knowledge Expressions, validated on domain and language datasets across Llama-2-7B and Mistral-7B. The study reveals localized knowledge regions, with domain-specific neurons clustering in middle layers and language-specific neurons being more dispersed, and demonstrates practical applications in knowledge editing and neuron-based prediction. Overall, QRNCA advances understanding of how knowledge is stored and manipulated in large language models and provides a concrete tool for targeted knowledge editing and interpretability.

Abstract

Large Language Models (LLMs) possess vast amounts of knowledge within their parameters, prompting research into methods for locating and editing this knowledge. Previous work has largely focused on locating entity-related (often single-token) facts in smaller models. However, several key questions remain unanswered: (1) How can we effectively locate query-relevant neurons in decoder-only LLMs, such as Llama and Mistral? (2) How can we address the challenge of long-form (or free-form) text generation? (3) Are there localized knowledge regions in LLMs? In this study, we introduce Query-Relevant Neuron Cluster Attribution (QRNCA), a novel architecture-agnostic framework capable of identifying query-relevant neurons in LLMs. QRNCA allows for the examination of long-form answers beyond triplet facts by employing the proxy task of multi-choice question answering. To evaluate the effectiveness of our detected neurons, we build two multi-choice QA datasets spanning diverse domains and languages. Empirical evaluations demonstrate that our method outperforms baseline methods significantly. Further, analysis of neuron distributions reveals the presence of visible localized regions, particularly within different domains. Finally, we show potential applications of our detected neurons in knowledge editing and neuron-based prediction.

Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts

TL;DR

This work introduces QRNCA, a scalable, architecture-agnostic framework for identifying query-relevant neurons in decoder-only LLMs during long-form text generation by recasting prompts as multi-choice QA. Through neuron attribution, inverse cluster attribution, and common-neuron filtering, QRNCA isolates QR neurons that influence specific knowledge Expressions, validated on domain and language datasets across Llama-2-7B and Mistral-7B. The study reveals localized knowledge regions, with domain-specific neurons clustering in middle layers and language-specific neurons being more dispersed, and demonstrates practical applications in knowledge editing and neuron-based prediction. Overall, QRNCA advances understanding of how knowledge is stored and manipulated in large language models and provides a concrete tool for targeted knowledge editing and interpretability.

Abstract

Large Language Models (LLMs) possess vast amounts of knowledge within their parameters, prompting research into methods for locating and editing this knowledge. Previous work has largely focused on locating entity-related (often single-token) facts in smaller models. However, several key questions remain unanswered: (1) How can we effectively locate query-relevant neurons in decoder-only LLMs, such as Llama and Mistral? (2) How can we address the challenge of long-form (or free-form) text generation? (3) Are there localized knowledge regions in LLMs? In this study, we introduce Query-Relevant Neuron Cluster Attribution (QRNCA), a novel architecture-agnostic framework capable of identifying query-relevant neurons in LLMs. QRNCA allows for the examination of long-form answers beyond triplet facts by employing the proxy task of multi-choice question answering. To evaluate the effectiveness of our detected neurons, we build two multi-choice QA datasets spanning diverse domains and languages. Empirical evaluations demonstrate that our method outperforms baseline methods significantly. Further, analysis of neuron distributions reveals the presence of visible localized regions, particularly within different domains. Finally, we show potential applications of our detected neurons in knowledge editing and neuron-based prediction.
Paper Structure (29 sections, 7 equations, 9 figures, 10 tables)

This paper contains 29 sections, 7 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Our overall framework , which aims to detect Query-Relevant (QR) neurons with regard to specific queries.
  • Figure 2: Overlap rates and layer distributions of found QR neurons.
  • Figure 3: The correct probability percentage change by boosting QR neurons. The LLM here is Llama-2-7B touvron2023llama. The suppression results are shown in Figure \ref{['fig:llama_prob_change_supress']} in the SM.
  • Figure 4: Geographical heatmap of detected QR neurons for different domains and languages. The value is calculated by our $\text{naica}(n_{i}^{l})$. Brighter colors indicate higher $\text{naica}$ values. The LLM here is Llama-2-7B (11008 $\times$ 32) touvron2023llama
  • Figure A1: UMAP visualisation of $\mathbf{W}^{D}$ vectors associated with the QR neurons and the token unembeddings
  • ...and 4 more figures