Table of Contents
Fetching ...

Pruning as a Domain-specific LLM Extractor

Nan Zhang, Yanchi Liu, Xujiang Zhao, Wei Cheng, Runxue Bao, Rui Zhang, Prasenjit Mitra, Haifeng Chen

TL;DR

D-Pruner tackles domain-specific compression of LLMs by combining generality-preserving pruning with domain-specific emphasis. It computes general weight importance from an open-domain calibration set, incorporates this into a regularized next-token objective, and uses an empirical Fisher-based approximation to derive a dual-pruning score that guides unstructured pruning on domain calibration data. The method yields sparse, domain-tuned models that maintain strong linguistic ability, multi-task solving, and domain expertise, outperforming several baselines on healthcare and legal tasks, especially in summarization and domain-specific NLI/QA. Practically, D-Pruner enables efficient deployment of domain-aware LLMs with limited calibration data, though it can be more memory-intensive during pruning and shows some perplexity gap relative to the strongest baselines in certain settings.

Abstract

Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array of NLP tasks. However, the escalation in model size also engenders substantial deployment costs. While few efforts have explored model pruning techniques to reduce the size of LLMs, they mainly center on general or task-specific weights. This leads to suboptimal performance due to lacking specificity on the target domain or generality on different tasks when applied to domain-specific challenges. This work introduces an innovative unstructured dual-pruning methodology, D-Pruner, for domain-specific compression on LLM. It extracts a compressed, domain-specific, and task-agnostic LLM by identifying LLM weights that are pivotal for general capabilities, like linguistic capability and multi-task solving, and domain-specific knowledge. More specifically, we first assess general weight importance by quantifying the error incurred upon their removal with the help of an open-domain calibration dataset. Then, we utilize this general weight importance to refine the training loss, so that it preserves generality when fitting into a specific domain. Moreover, by efficiently approximating weight importance with the refined training loss on a domain-specific calibration dataset, we obtain a pruned model emphasizing generality and specificity. Our comprehensive experiments across various tasks in healthcare and legal domains show the effectiveness of D-Pruner in domain-specific compression. Our code is available at https://github.com/psunlpgroup/D-Pruner.

Pruning as a Domain-specific LLM Extractor

TL;DR

D-Pruner tackles domain-specific compression of LLMs by combining generality-preserving pruning with domain-specific emphasis. It computes general weight importance from an open-domain calibration set, incorporates this into a regularized next-token objective, and uses an empirical Fisher-based approximation to derive a dual-pruning score that guides unstructured pruning on domain calibration data. The method yields sparse, domain-tuned models that maintain strong linguistic ability, multi-task solving, and domain expertise, outperforming several baselines on healthcare and legal tasks, especially in summarization and domain-specific NLI/QA. Practically, D-Pruner enables efficient deployment of domain-aware LLMs with limited calibration data, though it can be more memory-intensive during pruning and shows some perplexity gap relative to the strongest baselines in certain settings.

Abstract

Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array of NLP tasks. However, the escalation in model size also engenders substantial deployment costs. While few efforts have explored model pruning techniques to reduce the size of LLMs, they mainly center on general or task-specific weights. This leads to suboptimal performance due to lacking specificity on the target domain or generality on different tasks when applied to domain-specific challenges. This work introduces an innovative unstructured dual-pruning methodology, D-Pruner, for domain-specific compression on LLM. It extracts a compressed, domain-specific, and task-agnostic LLM by identifying LLM weights that are pivotal for general capabilities, like linguistic capability and multi-task solving, and domain-specific knowledge. More specifically, we first assess general weight importance by quantifying the error incurred upon their removal with the help of an open-domain calibration dataset. Then, we utilize this general weight importance to refine the training loss, so that it preserves generality when fitting into a specific domain. Moreover, by efficiently approximating weight importance with the refined training loss on a domain-specific calibration dataset, we obtain a pruned model emphasizing generality and specificity. Our comprehensive experiments across various tasks in healthcare and legal domains show the effectiveness of D-Pruner in domain-specific compression. Our code is available at https://github.com/psunlpgroup/D-Pruner.
Paper Structure (27 sections, 8 equations, 2 figures, 8 tables)

This paper contains 27 sections, 8 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Different types of pruning methods. An LLM is composed of domain-shared weights and domain-specific weights. Post-training pruning focuses on domain-shared weights for generality, pruning with fine-tuning focuses on domain-specific weights for specificity, and our dual-pruning method preserves weights pivotal for both generality and specificity.
  • Figure 2: Illustration of mask similarity. It shows that masks for different domains are quite different. The self-attention modules contribute more to specificity, and MLP modules store knowledge that is shared by different domains.