Table of Contents
Fetching ...

Discovering Decoupled Functional Modules in Large Language Models

Yanke Yu, Jin Li, Ying Sun, Ping Li, Zhefeng Wang, Yi Zheng

Abstract

Understanding the internal functional organization of Large Language Models (LLMs) is crucial for improving their trustworthiness and performance. However, how LLMs organize different functions into modules remains highly unexplored. To bridge this gap, we formulate a functional module discovery problem and propose an Unsupervised LLM Cross-layer MOdule Discovery (ULCMOD) framework that simultaneously disentangles the large set of neurons in the entire LLM into modules while discovering the topics of input samples related to these modules. Our framework introduces a novel objective function and an efficient Iterative Decoupling (IterD) algorithm. Extensive experiments show that our method discovers high-quality, disentangled modules that capture more meaningful semantic information and achieve superior performance in various downstream tasks. Moreover, our qualitative analysis reveals that the discovered modules show semantic coherence, correspond to interpretable specializations, and a clear spatial and hierarchical organization within the LLM. Our work provides a novel tool for interpreting the functional modules of LLMs, filling a critical blank in LLM's interpretability research.

Discovering Decoupled Functional Modules in Large Language Models

Abstract

Understanding the internal functional organization of Large Language Models (LLMs) is crucial for improving their trustworthiness and performance. However, how LLMs organize different functions into modules remains highly unexplored. To bridge this gap, we formulate a functional module discovery problem and propose an Unsupervised LLM Cross-layer MOdule Discovery (ULCMOD) framework that simultaneously disentangles the large set of neurons in the entire LLM into modules while discovering the topics of input samples related to these modules. Our framework introduces a novel objective function and an efficient Iterative Decoupling (IterD) algorithm. Extensive experiments show that our method discovers high-quality, disentangled modules that capture more meaningful semantic information and achieve superior performance in various downstream tasks. Moreover, our qualitative analysis reveals that the discovered modules show semantic coherence, correspond to interpretable specializations, and a clear spatial and hierarchical organization within the LLM. Our work provides a novel tool for interpreting the functional modules of LLMs, filling a critical blank in LLM's interpretability research.
Paper Structure (28 sections, 9 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 28 sections, 9 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustrations of brain modules vs. LLM modules.
  • Figure 2: Illustration of function modules. Each red block represents a function module, associated with a set of neurons and a set of representative samples employing it.
  • Figure 3: Overview of the IterD optimization preocess.
  • Figure 4: Average activation heatmap for discovered modules in Qwen2.5-7B-Instruct ($K=10$). Each cell $(i, j)$ shows the average activation of neuron set $U_j$ on sample set $S_i$.
  • Figure 5: Visualization of sample category similarity in Qwen2.5-7B-Instruct ($K=10$). The value in cell $(i, j)$ represents the average cosine similarity between the feature vectors ($\mathbf{x}_{s}$) of samples from category $i$ and category $j$.
  • ...and 3 more figures