Large Language Model Pruning

Hanjuan Huang; Hao-Jia Song; Hsing-Kuo Pao

Large Language Model Pruning

Hanjuan Huang, Hao-Jia Song, Hsing-Kuo Pao

TL;DR

This work tackles the challenge of compressing large language models without labeled data or retraining. It introduces an explainability-driven pruning method that uses mutual information between hidden neurons in FFN layers, estimated via a matrix-based Rényi entropy approach with a novel kernel-width selection mechanism. The framework includes a scalability option via clustering and a subsidiary KL-divergence criterion to select among random seeds, enabling effective pruning with minimal loss of performance on GLUE tasks using a compact BERT-tiny baseline. Empirical results show competitive or superior performance compared to unsupervised and some supervised/self-supervised baselines, highlighting the method's promise for edge deployment and green AI, with room for scaling to larger LLMs.

Abstract

We surely enjoy the larger the better models for their superior performance in the last couple of years when both the hardware and software support the birth of such extremely huge models. The applied fields include text mining and others. In particular, the success of LLMs on text understanding and text generation draws attention from researchers who have worked on NLP and related areas for years or even decades. On the side, LLMs may suffer from problems like model overfitting, hallucination, and device limitation to name a few. In this work, we suggest a model pruning technique specifically focused on LLMs. The proposed methodology emphasizes the explainability of deep learning models. By having the theoretical foundation, we obtain a trustworthy deep model so that huge models with a massive number of model parameters become not quite necessary. A mutual information-based estimation is adopted to find neurons with redundancy to eliminate. Moreover, an estimator with well-tuned parameters helps to find precise estimation to guide the pruning procedure. At the same time, we also explore the difference between pruning on large-scale models vs. pruning on small-scale models. The choice of pruning criteria is sensitive in small models but not for large-scale models. It is a novel finding through this work. Overall, we demonstrate the superiority of the proposed model to the state-of-the-art models.

Large Language Model Pruning

TL;DR

Abstract

Paper Structure (24 sections, 9 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 24 sections, 9 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Past Work
Structured Pruning for LLMs
Feature Selection based on Mutual Information
Mutual Information Estimation
Proposed Method
Notations
Framework
The Proposed Pruning Methodology
Working on Feed-forward Networks
Redundancy as Feature Selection Criteria
Clustering Strategy as a Scaling up Option
Subsidiary Condition
Estimate Method of Kernel Width Parameter of Hidden Neuron
Experiment Result
...and 9 more sections

Figures (7)

Figure 1: The pruning procedure has the input LLM with $L$ transformer blocks and produces the resulting model with the pruned FFN FC layer. We randomly select two neurons $k$ and $\ell$ and compute their MI $I(Z_k;Z_\ell)$. If their MI value is smaller than a pre-specified threshold $T_r$, one of the neurons is deleted. We repeat the step until a maximum number of iterations is reached.
Figure 2: Performance of the proposed method against other methods on BERT-tiny. Testing was conducted using the dev set to control relative FLOPs incrementally by percentage.
Figure 3: Performance of the proposed method against other MI estimators on BERT-tiny. Testing was conducted using the dev set to control relative FLOPs incrementally by percentage.
Figure 4: The result based on mutual information or Pearson correlation coefficient computation
Figure 5: Insignificant difference between the result from either the complete or partial dataset.
...and 2 more figures

Large Language Model Pruning

TL;DR

Abstract

Large Language Model Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)