Compressing Language Models for Specialized Domains
Miles Williams, George Chrysostomou, Vitor Jeronymo, Nikolaos Aletras
TL;DR
This work tackles the problem of domain-specific degeneration when compressing large language models by introducing cross-calibration, a training-free Hessian-based method that blends domain-focused and general knowledge. By decomposing the Hessian into domain-specific and general components and combining them via a regularization parameter, cross-calibration identifies weights influential for both in-domain and general performance without retraining. Empirical results across biomedical and legal domains show CC outperforms existing domain-aware pruning methods while preserving general capabilities, and it remains effective when combined with quantization, all with comparable or lower computational overhead. The approach is demonstrated to be language-agnostic and scalable across model families and sizes, enabling practical deployment of domain-specialized compressed LMs with minimal overhead and broad applicability.
Abstract
Compression techniques such as pruning and quantization offer a solution for more efficient deployment of language models (LMs), albeit with small performance drops in benchmark performance. However, general-purpose LM compression methods can negatively affect performance in specialized domains (e.g. biomedical or legal). Recent work has sought to address this, yet requires computationally expensive full-parameter fine-tuning. To this end, we propose cross-calibration, a novel training-free approach for improving the domain performance of compressed LMs. Our approach effectively leverages Hessian-based sensitivity to identify weights that are influential for both in-domain and general performance. Through extensive experimentation, we demonstrate that cross-calibration substantially outperforms existing approaches on domain-specific tasks, without compromising general performance. Notably, these gains come without additional computational overhead, displaying remarkable potential towards extracting domain-specialized compressed models from general-purpose LMs.
