Table of Contents
Fetching ...

COPAL: Continual Pruning in Large Language Generative Models

Srikanth Malla, Joon Hee Choi, Chiho Choi

TL;DR

COPAL tackles the dual challenge of efficiency and continual adaptation for large language models by introducing training-free continual pruning guided by a sensitivity analysis. It identifies and mitigates weight stasis while controlling forgetting through a calibration-driven pruning framework that updates a weight-importance metric across a sequence of datasets. The method demonstrates substantial gains in backward transfer and competitive perplexity across LLaMA models (7B–65B) and pruning styles (unstructured and semi-structured), outperforming existing post-training pruning baselines without any re-training. This approach offers a practical path to efficient, continually adaptable LLMs suitable for deployment in dynamic, data-shifting environments.

Abstract

Adapting pre-trained large language models to different domains in natural language processing requires two key considerations: high computational demands and model's inability to continual adaptation. To simultaneously address both issues, this paper presents COPAL (COntinual Pruning in Adaptive Language settings), an algorithm developed for pruning large language generative models under a continual model adaptation setting. While avoiding resource-heavy finetuning or retraining, our pruning process is guided by the proposed sensitivity analysis. The sensitivity effectively measures model's ability to withstand perturbations introduced by the new dataset and finds model's weights that are relevant for all encountered datasets. As a result, COPAL allows seamless model adaptation to new domains while enhancing the resource efficiency. Our empirical evaluation on a various size of LLMs show that COPAL outperforms baseline models, demonstrating its efficacy in efficiency and adaptability.

COPAL: Continual Pruning in Large Language Generative Models

TL;DR

COPAL tackles the dual challenge of efficiency and continual adaptation for large language models by introducing training-free continual pruning guided by a sensitivity analysis. It identifies and mitigates weight stasis while controlling forgetting through a calibration-driven pruning framework that updates a weight-importance metric across a sequence of datasets. The method demonstrates substantial gains in backward transfer and competitive perplexity across LLaMA models (7B–65B) and pruning styles (unstructured and semi-structured), outperforming existing post-training pruning baselines without any re-training. This approach offers a practical path to efficient, continually adaptable LLMs suitable for deployment in dynamic, data-shifting environments.

Abstract

Adapting pre-trained large language models to different domains in natural language processing requires two key considerations: high computational demands and model's inability to continual adaptation. To simultaneously address both issues, this paper presents COPAL (COntinual Pruning in Adaptive Language settings), an algorithm developed for pruning large language generative models under a continual model adaptation setting. While avoiding resource-heavy finetuning or retraining, our pruning process is guided by the proposed sensitivity analysis. The sensitivity effectively measures model's ability to withstand perturbations introduced by the new dataset and finds model's weights that are relevant for all encountered datasets. As a result, COPAL allows seamless model adaptation to new domains while enhancing the resource efficiency. Our empirical evaluation on a various size of LLMs show that COPAL outperforms baseline models, demonstrating its efficacy in efficiency and adaptability.
Paper Structure (22 sections, 28 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 28 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Motivation: average (mean) and worst (max) case scenarios of backward transfer (BWT) in perplexity with increase in sparsity ratio in unstructured continual pruning of LLaMA-7B
  • Figure 2: Overview of Forgetting, Weight Stasis and COPAL Framework in Continual Pruning. D represents calibration data used for pruning, and D1$\rightarrow$D2$\rightarrow$D3 is the dataset incremental order here.
  • Figure 3: Motivation: average (mean) and worst (max) case scenarios of backward transfer in perplexity with increase in samples in unstructured continual pruning of LLaMA-7B with 50% sparsity