Table of Contents
Fetching ...

Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models

Yiheng Liu, Zhengliang Liu, Zihao Wu, Junhao Ning, Haiyang Sun, Sichen Xia, Yang Yang, Xiaohui Gao, Ning Qiang, Bao Ge, Tianming Liu, Junwei Han, Xintao Hu

TL;DR

This work addresses mechanistic interpretability of large language models by adopting a brain-inspired perspective that seeks functional networks composed of neuron ensembles. By treating MLP activations as fMRI-like signals and applying Independent Component Analysis, the authors identify stable, sparse functional networks that recur across inputs and tasks. Targeted inhibition of these networks degrades performance and language modeling ability, while carefully calibrated amplification can improve task-specific or overall performance, supporting a distributed, ensemble-based computation paradigm. The study highlights cross-task generalization, offers a potential avenue for principled pruning and model fingerprinting, and demonstrates the value of cross-disciplinary methods for understanding LLM internals; code is released at the provided repository.

Abstract

In recent years, the rapid advancement of large language models (LLMs) in natural language processing has sparked significant interest among researchers to understand their mechanisms and functional characteristics. Although prior studies have attempted to explain LLM functionalities by identifying and interpreting specific neurons, these efforts mostly focus on individual neuron contributions, neglecting the fact that human brain functions are realized through intricate interaction networks. Inspired by research on functional brain networks (FBNs) in the field of neuroscience, we utilize similar methodologies estabilished in FBN analysis to explore the "functional networks" within LLMs in this study. Experimental results highlight that, much like the human brain, LLMs exhibit certain functional networks that recur frequently during their operation. Further investigation reveals that these functional networks are indispensable for LLM performance. Inhibiting key functional networks severely impairs the model's capabilities. Conversely, amplifying the activity of neurons within these networks can enhance either the model's overall performance or its performance on specific tasks. This suggests that these functional networks are strongly associated with either specific tasks or the overall performance of the LLM. Code is available at https://github.com/WhatAboutMyStar/LLM_ACTIVATION.

Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models

TL;DR

This work addresses mechanistic interpretability of large language models by adopting a brain-inspired perspective that seeks functional networks composed of neuron ensembles. By treating MLP activations as fMRI-like signals and applying Independent Component Analysis, the authors identify stable, sparse functional networks that recur across inputs and tasks. Targeted inhibition of these networks degrades performance and language modeling ability, while carefully calibrated amplification can improve task-specific or overall performance, supporting a distributed, ensemble-based computation paradigm. The study highlights cross-task generalization, offers a potential avenue for principled pruning and model fingerprinting, and demonstrates the value of cross-disciplinary methods for understanding LLM internals; code is released at the provided repository.

Abstract

In recent years, the rapid advancement of large language models (LLMs) in natural language processing has sparked significant interest among researchers to understand their mechanisms and functional characteristics. Although prior studies have attempted to explain LLM functionalities by identifying and interpreting specific neurons, these efforts mostly focus on individual neuron contributions, neglecting the fact that human brain functions are realized through intricate interaction networks. Inspired by research on functional brain networks (FBNs) in the field of neuroscience, we utilize similar methodologies estabilished in FBN analysis to explore the "functional networks" within LLMs in this study. Experimental results highlight that, much like the human brain, LLMs exhibit certain functional networks that recur frequently during their operation. Further investigation reveals that these functional networks are indispensable for LLM performance. Inhibiting key functional networks severely impairs the model's capabilities. Conversely, amplifying the activity of neurons within these networks can enhance either the model's overall performance or its performance on specific tasks. This suggests that these functional networks are strongly associated with either specific tasks or the overall performance of the LLM. Code is available at https://github.com/WhatAboutMyStar/LLM_ACTIVATION.

Paper Structure

This paper contains 23 sections, 15 equations, 18 figures, 13 tables.

Figures (18)

  • Figure 1: The framework of brain inspired exploration of functional networks within LLMs. (a) Identifying functional brain networks from whole-brain fMRI signals via ICA. (b) Identifying functional networks from responses of artificial neurons within LLMs.
  • Figure 2: Perplexity results for Qwen2.5-7B-Instruct after inhibiting the neurons corresponding to each of 10 functional networks obtained from SST-2 dataset, respectively.
  • Figure 3: Zero-shot results for ChatGLM3-6B-base after amplifying the neurons corresponding to a functional network obtained from SST-2 dataset.
  • Figure 4: Example functional networks that are consistent across various types of input samples in group level analysis. Each row represents the neurons in an MLP layer where activated neurons are highlighted.
  • Figure 5: Results perplexity of additive editing for 10 functional networks in ChatGLM3-6B-base. Adding value is set as 1.
  • ...and 13 more figures