Table of Contents
Fetching ...

Sample-aware Adaptive Structured Pruning for Large Language Models

Jun Kong, Xinge Ma, Jin Wang, Xuejie Zhang

TL;DR

AdaPruner tackles the practical challenge of pruning large language models by treating structure removal as a joint search over a calibration-data subspace and a weight-importance metrics subspace. It constructs a two-part pruning solution space and employs Bayesian optimization (BO-TPE) to adaptively identify high-quality calibration data and metrics, then performs pruning followed by LoRA-based fine-tuning. Empirical results show AdaPruner outperforms existing structured pruning methods across LLaMA-7B and Vicuna-7B under multiple pruning ratios, preserving substantial language modeling and zero-shot commonsense performance (e.g., 97% retention at 20% pruning in some settings). The approach demonstrates robustness, generalization across models, and reduced manual design effort for pruning configurations, with clear ablations validating the importance of calibration data, joint optimization, and Bayesian search.

Abstract

Large language models (LLMs) have achieved outstanding performance in natural language processing, but enormous model sizes and high computational costs limit their practical deployment. Structured pruning can effectively reduce the resource demands for deployment by removing redundant model parameters. However, the randomly selected calibration data and fixed single importance estimation metrics in existing structured pruning methods lead to degraded performance of pruned models. This study introduces AdaPruner, a sample-aware adaptive structured pruning framework for LLMs, aiming to optimize the calibration data and importance estimation metrics in the structured pruning process. Specifically, AdaPruner effectively removes redundant parameters from LLMs by constructing a structured pruning solution space and then employing Bayesian optimization to adaptively search for the optimal calibration data and importance estimation metrics. Experimental results show that the AdaPruner outperforms existing structured pruning methods on a family of LLMs with varying pruning ratios, demonstrating its applicability and robustness. Remarkably, at a 20\% pruning ratio, the model pruned with AdaPruner maintains 97\% of the performance of the unpruned model.

Sample-aware Adaptive Structured Pruning for Large Language Models

TL;DR

AdaPruner tackles the practical challenge of pruning large language models by treating structure removal as a joint search over a calibration-data subspace and a weight-importance metrics subspace. It constructs a two-part pruning solution space and employs Bayesian optimization (BO-TPE) to adaptively identify high-quality calibration data and metrics, then performs pruning followed by LoRA-based fine-tuning. Empirical results show AdaPruner outperforms existing structured pruning methods across LLaMA-7B and Vicuna-7B under multiple pruning ratios, preserving substantial language modeling and zero-shot commonsense performance (e.g., 97% retention at 20% pruning in some settings). The approach demonstrates robustness, generalization across models, and reduced manual design effort for pruning configurations, with clear ablations validating the importance of calibration data, joint optimization, and Bayesian search.

Abstract

Large language models (LLMs) have achieved outstanding performance in natural language processing, but enormous model sizes and high computational costs limit their practical deployment. Structured pruning can effectively reduce the resource demands for deployment by removing redundant model parameters. However, the randomly selected calibration data and fixed single importance estimation metrics in existing structured pruning methods lead to degraded performance of pruned models. This study introduces AdaPruner, a sample-aware adaptive structured pruning framework for LLMs, aiming to optimize the calibration data and importance estimation metrics in the structured pruning process. Specifically, AdaPruner effectively removes redundant parameters from LLMs by constructing a structured pruning solution space and then employing Bayesian optimization to adaptively search for the optimal calibration data and importance estimation metrics. Experimental results show that the AdaPruner outperforms existing structured pruning methods on a family of LLMs with varying pruning ratios, demonstrating its applicability and robustness. Remarkably, at a 20\% pruning ratio, the model pruned with AdaPruner maintains 97\% of the performance of the unpruned model.

Paper Structure

This paper contains 35 sections, 12 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: The impact of calibration data and importance estimation metrics.
  • Figure 2: Overview of the AdaPruner Framework.
  • Figure 3: Average performance of structured pruning on LLa-MA-7B at 20% pruning ratio.
  • Figure 4: Comparison of Bayesian and stochastic optimization processes.
  • Figure 5: Average performance of structured pruning for Vicuna-7B at 20% pruning ratio and LLaMA-7B at 50% pruning ratio.