Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
Zhichao Xu, Ashim Gupta, Tao Li, Oliver Bentham, Vivek Srikumar
TL;DR
This work tackles safety in locally deployed compressed LLMs by examining four dimensions of harm beyond perplexity: degeneration harm, representational harm, dialect bias, and downstream performance. It systematically compares pruning (unstructured and semi-structured) and quantization across two base models (Llama-2 and Tülu-2) and multiple sizes, revealing that compression can reduce generation toxicity while sometimes increasing discrimination-related biases, with divergent effects across protected groups and dialects. Key findings show that quantization tends to preserve safety and performance better than pruning at comparable rates, whereas SFT can reduce degeneration but not representational harms, and the order of pruning and fine-tuning matters for downstream tasks and biases. The paper advocates for integrating fine-grained safety evaluations into compression workflows to ensure reliable, equitable behavior in real-world deployments.
Abstract
Increasingly, model compression techniques enable large language models (LLMs) to be deployed in real-world applications. As a result of this momentum towards local deployment, compressed LLMs will interact with a large population. Prior work on compression typically prioritize preserving perplexity, which is directly analogous to training loss. The impact of compression method on other critical aspects of model behavior\, -- \,particularly safety\, -- \,requires systematic assessment. To this end, we investigate the impact of model compression along four dimensions: (1) degeneration harm, i.e., bias and toxicity in generation; (2) representational harm, i.e., biases in discriminative tasks; (3) dialect bias; and(4) language modeling and downstream task performance. We examine a wide spectrum of LLM compression techniques, including unstructured pruning, semi-structured pruning, and quantization. Our analysis reveals that compression can lead to unexpected consequences. Although compression may unintentionally alleviate LLMs' degeneration harm, it can still exacerbate representational harm. Furthermore, increasing compression produces a divergent impact on different protected groups. Finally, different compression methods have drastically different safety impacts: for example, quantization mostly preserves bias while pruning degrades quickly. Our findings underscore the importance of integrating safety assessments into the development of compressed LLMs to ensure their reliability across real-world applications.\footnote{Our implementation and results are available here: \url{https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval}}
