A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

Pengxiang Zhao; Hanyu Hu; Ping Li; Yi Zheng; Zhefeng Wang; Xiaoming Yuan

A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

Pengxiang Zhao, Hanyu Hu, Ping Li, Yi Zheng, Zhefeng Wang, Xiaoming Yuan

TL;DR

This paper proposes a convex optimization model incorporating $\ell_1$ norm to induce sparsity and utilize the FISTA solver for optimization and introduces FISTAPruner, the first post-training pruner based on convex optimization models and algorithms.

Abstract

Pruning is a critical strategy for compressing trained large language models (LLMs), aiming at substantial memory conservation and computational acceleration without compromising performance. However, existing pruning methods often necessitate inefficient retraining for billion-scale LLMs or rely on heuristic methods such as the optimal brain surgeon framework, which degrade performance. In this paper, we introduce FISTAPruner, the first post-training pruner based on convex optimization models and algorithms. Specifically, we propose a convex optimization model incorporating $\ell_1$ norm to induce sparsity and utilize the FISTA solver for optimization. FISTAPruner incorporates an intra-layer cumulative error correction mechanism and supports parallel pruning. We comprehensively evaluate FISTAPruner on models such as OPT, LLaMA, LLaMA-2, and LLaMA-3 with 125M to 70B parameters under unstructured and 2:4 semi-structured sparsity, demonstrating superior performance over existing state-of-the-art methods across various language benchmarks.

A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

TL;DR

This paper proposes a convex optimization model incorporating

norm to induce sparsity and utilize the FISTA solver for optimization and introduces FISTAPruner, the first post-training pruner based on convex optimization models and algorithms.

Abstract

norm to induce sparsity and utilize the FISTA solver for optimization. FISTAPruner incorporates an intra-layer cumulative error correction mechanism and supports parallel pruning. We comprehensively evaluate FISTAPruner on models such as OPT, LLaMA, LLaMA-2, and LLaMA-3 with 125M to 70B parameters under unstructured and 2:4 semi-structured sparsity, demonstrating superior performance over existing state-of-the-art methods across various language benchmarks.

Paper Structure (20 sections, 25 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 20 sections, 25 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
Background and Related Work
Methodology
Post-Training Pruning Model with Intra-layer Error Corrections
Optimization based on FISTA
Adaptive Hyperparameter Tuning
FISTAPruner Pseudocode
Experiments
Settings
Perplexity Experiment Results
Zero-Shot Task Results
Ablation Studies
Discussion
Conclusion
Derivations of the Proposed Optimization Model
...and 5 more sections

Figures (6)

Figure 1: Overview of the proposed FISTAPruner. Given a weight matrix $W$ and its corresponding input feature activation $X$, we employ the proposed convex optimization model, utilizing FISTA, to derive the pruned weights.
Figure 2: Illustration of the proposed intra-layer error correction mechanism. $W_1$ and $W_2$ represent the weights of two sequential layers within the network architecture.
Figure 3: Comparative analysis of sparsity versus perplexity across different methods for OPT-125M and LLaMA-3-8B models on WikiText dataset.
Figure 4: Ablation studies of FISTAPruner on the WikiText dataset, showcasing the effects of intra-layer error correction and varying calibration sample sizes.
Figure 5: Ablation studies of FISTAPruner on the PTB dataset, showcasing the effects of intra-layer error correction and varying calibration sample sizes.
...and 1 more figures

Theorems & Definitions (2)

Remark 3.1
Remark 3.2

A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

TL;DR

Abstract

A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (2)