EfficientXpert: Efficient Domain Adaptation for Large Language Models via Propagation-Aware Pruning
Songlin Zhao, Michael Pitts, Zhuwei Qin
TL;DR
EfficientXpert tackles the challenge of domain-adaptive compression for large language models by marrying a propagation-aware pruning criterion (ForeSight Mask) with an adapter realignment step (Partial Brain Surgeon) within LoRA fine-tuning. The framework accounts for forward error propagation across layers and performs a post-hoc alignment of low-rank adapters to the surviving subnetwork, enabling a one-shot transformation from a dense pretrained model to a sparse, domain-specialized expert. Across health and legal domains, EfficientXpert consistently outperforms existing domain-pruning baselines, achieving near-dense performance at substantial sparsity (e.g., 40% sparsity) and revealing that domain shifts, not tasks, largely drive pruning sensitivity. These findings highlight the necessity of domain-adaptive pruning strategies to realize practical, resource-efficient LLM deployment in specialized domains.
Abstract
The rapid advancement of large language models (LLMs) has increased the demand for domain-specialized variants in areas such as law, healthcare, and finance. However, their large size remains a barrier to deployment in resource-constrained environments, and existing compression methods either generalize poorly across domains or incur high overhead. In this work, we propose \textbf{EfficientXpert}, a lightweight domain-pruning framework that combines a propagation-aware pruning criterion (Foresight Mask) with an efficient adapter-update algorithm (Partial Brain Surgeon). Integrated into the LoRA fine-tuning process, EfficientXpert enables a one-step transformation of general pretrained models into sparse, domain-adapted experts. Across health and legal tasks, it retains up to 98% of dense-model performance at 40% sparsity, outperforming state-of-the-art methods. Further analysis reveals substantial domain-dependent structural shifts that degrade the effectiveness of general pruning masks, underscoring the need for adaptive, domain-aware pruning strategies tailored to each domain.
