Table of Contents
Fetching ...

Symmetric Pruning of Large Language Models

Kai Yi, Peter Richtárik

TL;DR

A novel training-free fine-tuning approach $R^2$-DSnoT that incorporates relative weight importance and a regularized decision boundary within a dynamic pruning-and-growing framework, significantly outperforming strong baselines and establishing a new state of the art.

Abstract

Popular post-training pruning methods such as Wanda and RIA are known for their simple, yet effective, designs that have shown exceptional empirical performance. Wanda optimizes performance through calibrated activations during pruning, while RIA emphasizes the relative, rather than absolute, importance of weight elements. Despite their practical success, a thorough theoretical foundation explaining these outcomes has been lacking. This paper introduces new theoretical insights that redefine the standard minimization objective for pruning, offering a deeper understanding of the factors contributing to their success. Our study extends beyond these insights by proposing complementary strategies that consider both input activations and weight significance. We validate these approaches through rigorous experiments, demonstrating substantial enhancements over existing methods. Furthermore, we introduce a novel training-free fine-tuning approach $R^2$-DSnoT that incorporates relative weight importance and a regularized decision boundary within a dynamic pruning-and-growing framework, significantly outperforming strong baselines and establishing a new state of the art.

Symmetric Pruning of Large Language Models

TL;DR

A novel training-free fine-tuning approach -DSnoT that incorporates relative weight importance and a regularized decision boundary within a dynamic pruning-and-growing framework, significantly outperforming strong baselines and establishing a new state of the art.

Abstract

Popular post-training pruning methods such as Wanda and RIA are known for their simple, yet effective, designs that have shown exceptional empirical performance. Wanda optimizes performance through calibrated activations during pruning, while RIA emphasizes the relative, rather than absolute, importance of weight elements. Despite their practical success, a thorough theoretical foundation explaining these outcomes has been lacking. This paper introduces new theoretical insights that redefine the standard minimization objective for pruning, offering a deeper understanding of the factors contributing to their success. Our study extends beyond these insights by proposing complementary strategies that consider both input activations and weight significance. We validate these approaches through rigorous experiments, demonstrating substantial enhancements over existing methods. Furthermore, we introduce a novel training-free fine-tuning approach -DSnoT that incorporates relative weight importance and a regularized decision boundary within a dynamic pruning-and-growing framework, significantly outperforming strong baselines and establishing a new state of the art.

Paper Structure

This paper contains 38 sections, 10 theorems, 56 equations, 1 figure, 10 tables.

Key Result

Lemma 3.1

Assume we aim to eliminate a single weight ${\mathbf{W}}_{jk}$, setting $\widetilde{\mathbf{W}}_{jk} = 0$ and keeping all other weights unchanged. The simplified expression for $g(\widetilde{\mathbf{W}})$ becomes: where $\mathbf{X}_{:j}$ and $\mathbf{Y}_{k:}$ represent the j-th column and k-th row of $\mathbf{X}$ and $\mathbf{Y}$, respectively.

Figures (1)

  • Figure 1: Visualization of the dense weight matrix in LLaMA2-7b.

Theorems & Definitions (10)

  • Lemma 3.1
  • Corollary 3.2
  • Corollary 3.3
  • Corollary 3.4
  • Theorem 3.5
  • Lemma 3.6
  • Lemma 3.7
  • Lemma 3.8: Generalized $\ell_p$-norm
  • Lemma 3.9: Random unit vector scaling
  • Lemma 3.10