Revisiting Large Language Model Pruning using Neuron Semantic Attribution
Yizhuo Ding, Xinwei Sun, Yanwei Fu, Guosheng Hu
TL;DR
The paper investigates the generalizability of large language model pruning across diverse tasks and datasets, revealing that calibration data largely shapes pruning outcomes and that sentiment classification can suffer substantial drops under common pruning regimes. It evaluates three post-training pruning methods—SparseGPT, Wanda, and RIA—across 14 models, 24 datasets, and four task categories, using accuracy as the primary metric and examining factors like sparsity and sequence length. To explain pruning-induced performance changes, the authors introduce Neuron Semantic Attribution (NSA), a framework that links neuron activations to influential input semantics via a three-step process (influential word selection, neuron–word matching, and unpruned-vs-pruned comparison) and demonstrate NSA visualizations on Yelp and ARC-C data. The results emphasize task- and data-dependent effects, show that calibration data can dramatically alter pruning efficacy, and provide actionable insights for designing more robust, interpretable pruning methods and calibration-data strategies with practical implications for deploying compressed LLMs. Overall, the work advances both the empirical understanding of pruning generalization and the interpretability of pruning decisions through neuron–semantics mappings.
Abstract
Model pruning technique is vital for accelerating large language models by reducing their size and computational requirements. However, the generalizability of existing pruning methods across diverse datasets and tasks remains unclear. Thus, we conduct extensive evaluations on 24 datasets and 4 tasks using popular pruning methods. Based on these evaluations, we find and then investigate that calibration set greatly affect the performance of pruning methods. In addition, we surprisingly find a significant performance drop of existing pruning methods in sentiment classification tasks. To understand the link between performance drop and pruned neurons, we propose Neuron Semantic Attribution, which learns to associate each neuron with specific semantics. This method first makes the unpruned neurons of LLMs explainable.
