Table of Contents
Fetching ...

Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability

Ashhadul Islam, Samir Brahim Belhaouari, Amine Bermak

TL;DR

The paper tackles the environmental impact of training large language models by introducing a training-aware pruning method that continually evaluates the importance of individual weights across epochs. By computing a weighted importance score $Imp_i$ and maintaining a weighted-average clone of parameters, the approach guides pruning with explicit thresholds such as $ ext{Threshold} = \sigma(W_{abs}) \times \text{PruneRate}$ and $W_{abs} = |W|$. Empirical results on a scaled-down 10.7M-parameter Transformer and a 4.2B Phi-3-vision model show that moderate pruning can improve efficiency or reduce loss/MAE, while aggressive pruning drastically degrades performance. The findings advocate for sustainable AI development through training-aware sparsity, balancing compression with accuracy in both language and multimodal settings.

Abstract

The exponential growth of large language models (LLMs) like ChatGPT has revolutionized artificial intelligence, offering unprecedented capabilities in natural language processing. However, the extensive computational resources required for training these models have significant environmental implications, including high carbon emissions, energy consumption, and water usage. This research presents a novel approach to LLM pruning, focusing on the systematic evaluation of individual weight importance throughout the training process. By monitoring parameter evolution over time, we propose a method that effectively reduces model size without compromising performance. Extensive experiments with both a scaled-down LLM and a large multimodal model reveal that moderate pruning enhances efficiency and reduces loss, while excessive pruning drastically deteriorates model performance. These findings highlight the critical need for optimized AI models to ensure sustainable development, balancing technological advancement with environmental responsibility.

Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability

TL;DR

The paper tackles the environmental impact of training large language models by introducing a training-aware pruning method that continually evaluates the importance of individual weights across epochs. By computing a weighted importance score and maintaining a weighted-average clone of parameters, the approach guides pruning with explicit thresholds such as and . Empirical results on a scaled-down 10.7M-parameter Transformer and a 4.2B Phi-3-vision model show that moderate pruning can improve efficiency or reduce loss/MAE, while aggressive pruning drastically degrades performance. The findings advocate for sustainable AI development through training-aware sparsity, balancing compression with accuracy in both language and multimodal settings.

Abstract

The exponential growth of large language models (LLMs) like ChatGPT has revolutionized artificial intelligence, offering unprecedented capabilities in natural language processing. However, the extensive computational resources required for training these models have significant environmental implications, including high carbon emissions, energy consumption, and water usage. This research presents a novel approach to LLM pruning, focusing on the systematic evaluation of individual weight importance throughout the training process. By monitoring parameter evolution over time, we propose a method that effectively reduces model size without compromising performance. Extensive experiments with both a scaled-down LLM and a large multimodal model reveal that moderate pruning enhances efficiency and reduces loss, while excessive pruning drastically deteriorates model performance. These findings highlight the critical need for optimized AI models to ensure sustainable development, balancing technological advancement with environmental responsibility.

Paper Structure

This paper contains 17 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: (a) Depicts the evolution of randomly selected weights over 100 training epochs. (b) Illustrates the progression of specific weights across the same 100 training epochs.
  • Figure 2: Evolution of weights over training epochs. Fig (a) and (b) show weights that increase dramatically, while (c) and (d) illustrate weights that fluctuate minimally. Fig (e) and (f) further demonstrate the divergence and stability of weights, emphasizing patterns critical for effective pruning.
  • Figure 3: Loss as a function of compression levels. The figure shows a decrease in loss up to 60% compression, after which a sharp increase is observed, particularly beyond 70% compression, consistent with the trends detailed in Table \ref{['tab:compression_loss']}.
  • Figure 4: Price error as a function of compression levels. The figure demonstrates that while the model maintains a relatively low error up to moderate compression levels, the error escalates sharply beyond 30% compression, consistent with the MAE trends observed in Table \ref{['tab:mae']}.