Table of Contents
Fetching ...

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu

TL;DR

Pruner-Zero introduces a novel approach to pruning large language models by automatically discovering symbolic pruning metrics through genetic programming. Framing pruning-metric design as symbolic regression, it builds a unified search space that captures existing metrics and uses an Opposing Operation Simplification to reduce redundancy. The framework evolves pruning metrics and evaluates them via post-training pruning perplexity on WikiText2, achieving state-of-the-art results on LLaMA and LLaMA-2 without weight updates, across both unstructured and structured sparsity regimes. Analyses reveal practical design principles for pruning metrics and demonstrate robust generalization across model families, including zero-shot and in-context learning scenarios. The work significantly lowers the barrier to effective LLM pruning by automating metric discovery and avoiding retraining, with potential impact on deployment efficiency and accessibility of large models.

Abstract

Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. However, these metrics require the involvement of human experts and tedious trial and error. To efficiently identify superior pruning metrics, we develop an automatic framework for searching symbolic pruning metrics using genetic programming. In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric. We propose an opposing operation simplification strategy to increase the diversity of the population. In this way, Pruner-Zero allows auto-generation of symbolic pruning metrics. Based on the searched results, we explore the correlation between pruning metrics and performance after pruning and summarize some principles. Extensive experiments on LLaMA and LLaMA-2 on language modeling and zero-shot tasks demonstrate that our Pruner-Zero obtains superior performance than SOTA post-training pruning methods. Code at: \url{https://github.com/pprp/Pruner-Zero}.

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

TL;DR

Pruner-Zero introduces a novel approach to pruning large language models by automatically discovering symbolic pruning metrics through genetic programming. Framing pruning-metric design as symbolic regression, it builds a unified search space that captures existing metrics and uses an Opposing Operation Simplification to reduce redundancy. The framework evolves pruning metrics and evaluates them via post-training pruning perplexity on WikiText2, achieving state-of-the-art results on LLaMA and LLaMA-2 without weight updates, across both unstructured and structured sparsity regimes. Analyses reveal practical design principles for pruning metrics and demonstrate robust generalization across model families, including zero-shot and in-context learning scenarios. The work significantly lowers the barrier to effective LLM pruning by automating metric discovery and avoiding retraining, with potential impact on deployment efficiency and accessibility of large models.

Abstract

Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. However, these metrics require the involvement of human experts and tedious trial and error. To efficiently identify superior pruning metrics, we develop an automatic framework for searching symbolic pruning metrics using genetic programming. In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric. We propose an opposing operation simplification strategy to increase the diversity of the population. In this way, Pruner-Zero allows auto-generation of symbolic pruning metrics. Based on the searched results, we explore the correlation between pruning metrics and performance after pruning and summarize some principles. Extensive experiments on LLaMA and LLaMA-2 on language modeling and zero-shot tasks demonstrate that our Pruner-Zero obtains superior performance than SOTA post-training pruning methods. Code at: \url{https://github.com/pprp/Pruner-Zero}.
Paper Structure (37 sections, 9 equations, 4 figures, 26 tables, 1 algorithm)

This paper contains 37 sections, 9 equations, 4 figures, 26 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the Automatic Symbolic Pruning Metric Discovery Process in our Pruner-Zero framework. This process employs genetic programming to iteratively generate and refine symbolic pruning metrics via tournament selection, subtree crossover, and node mutation. Upon generating offspring, the Opposing Operation Simplification (OOS) strategy is applied to diminish repetition. Subsequently, evaluation is conducted on the LLaMA-2-7B using the WikiText2 dataset, with perplexity serving as the fitness metric. Note that it only takes less than 5 minutes to perform the post-training pruning evaluation.
  • Figure 2: Comparison between Evolution Search and Random Search Processes. Notably, the individual perplexity is lower in the evolution search method, leading to significant improvements in the search efficiency and overall stability.
  • Figure 3: Left: Perplexity under Various Sparsity Ratio; Right: Perplexity with Different Calibration Samples.
  • Figure 4: Correlation Matrix of Primitive Operations with Perplexity.