Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities
Junyan Zhang, Yubo Gao, Yibo Yan, Jungang Li, Zhaorui Hou, Sicheng Tao, Shuliang Liu, Song Dai, Yonghua Hei, Junzhuo Li, Xuming Hu
TL;DR
The paper investigates how fine-tuning enhances instruction-following by reconfiguring sparse components within LLMs and MoE models. It introduces HexaInst, a balanced six-category instruction dataset, and SPARCOM, a framework to identify Instruction-Specific Neurons (ISNs) and Instruction-Specific Experts (ISEs), evaluate their generality/uniqueness, and compare their alterations before and after fine-tuning. The study demonstrates that ISNs and ISEs exhibit both general and unique components, show stable distribution patterns across instruction types, and shift in predictable ways during fine-tuning, indicating these sparse substrates play a critical role in instruction execution. These findings advance mechanistic interpretability of instruction-following, suggesting avenues for targeted optimization and trustworthy LLM design. Overall, SPARCOM provides a principled method to map sparse computational substrates to instruction capabilities, with HexaInst enabling robust cross-task analysis.
Abstract
The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.
