Table of Contents
Fetching ...

Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

Junyan Zhang, Yubo Gao, Yibo Yan, Jungang Li, Zhaorui Hou, Sicheng Tao, Shuliang Liu, Song Dai, Yonghua Hei, Junzhuo Li, Xuming Hu

TL;DR

The paper investigates how fine-tuning enhances instruction-following by reconfiguring sparse components within LLMs and MoE models. It introduces HexaInst, a balanced six-category instruction dataset, and SPARCOM, a framework to identify Instruction-Specific Neurons (ISNs) and Instruction-Specific Experts (ISEs), evaluate their generality/uniqueness, and compare their alterations before and after fine-tuning. The study demonstrates that ISNs and ISEs exhibit both general and unique components, show stable distribution patterns across instruction types, and shift in predictable ways during fine-tuning, indicating these sparse substrates play a critical role in instruction execution. These findings advance mechanistic interpretability of instruction-following, suggesting avenues for targeted optimization and trustworthy LLM design. Overall, SPARCOM provides a principled method to map sparse computational substrates to instruction capabilities, with HexaInst enabling robust cross-task analysis.

Abstract

The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.

Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

TL;DR

The paper investigates how fine-tuning enhances instruction-following by reconfiguring sparse components within LLMs and MoE models. It introduces HexaInst, a balanced six-category instruction dataset, and SPARCOM, a framework to identify Instruction-Specific Neurons (ISNs) and Instruction-Specific Experts (ISEs), evaluate their generality/uniqueness, and compare their alterations before and after fine-tuning. The study demonstrates that ISNs and ISEs exhibit both general and unique components, show stable distribution patterns across instruction types, and shift in predictable ways during fine-tuning, indicating these sparse substrates play a critical role in instruction execution. These findings advance mechanistic interpretability of instruction-following, suggesting avenues for targeted optimization and trustworthy LLM design. Overall, SPARCOM provides a principled method to map sparse computational substrates to instruction capabilities, with HexaInst enabling robust cross-task analysis.

Abstract

The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.

Paper Structure

This paper contains 40 sections, 16 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparison of research focuses between Language-Specific Neurons (a) and Instruction-Specific Neurons & Experts in dense LLMs & MoE models (b).
  • Figure 2: The SPARCOM framework, which comprises three elements, aims for the identification & evaluation of sparse components. ISNs and ISEs denote Instruction-Specific Neurons and Instruction-Specific Experts.
  • Figure 3: Overlaps and differences in ISNs distribution across same-type and different-type instructions on LLaMA-2-Chat-7B, LLaMA-2-Chat-13B, Mistral-7B-Instruct-v0.1, and Qwen1.5-MoE-A2.7B-Chat.
  • Figure 4: Overlaps and differences in ISEs distribution across same-type and different-type instructions on Qwen1.5-MoE-A2.7B-Chat.
  • Figure 5: Hierarchy distribution of ISNs across different layers. The upper part includes LLaMA-2-Chat-7B, LLaMA-2-Chat-13B, Mistral-7B-Instruct-v0.1, and Qwen1.5-MoE-A2.7B-Chat models. The down part includes LLaMA-2-7B, LLaMA-2-13B, Mistral-7B-v0.1, and Qwen1.5-MoE-A2.7B models.
  • ...and 1 more figures