Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong; Jinhao Duan; Chenhui Zhang; Zhangheng Li; Chulin Xie; Kelsey Lieberman; James Diffenderfer; Brian Bartoldson; Ajay Jaiswal; Kaidi Xu; Bhavya Kailkhura; Dan Hendrycks; Dawn Song; Zhangyang Wang; Bo Li

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li

TL;DR

This work evaluates how compression affects LLM trustworthiness across eight dimensions using five SoTA methods on three 13B models. It finds that quantization, especially at moderate bit-width (e.g., 4-bit), can preserve or even improve trust in several dimensions, while pruning often harms trust. Extreme 3-bit quantization can cause substantial declines in safety and robustness, underscoring that benign performance alone is insufficient for deployment decisions. The study also compares 7B-sized models obtained by training from scratch versus compressing 13B models, showing that the release of trust varies by route and dimension, and provides practical recommendations for achieving high utility, efficiency, and trustworthiness in compressed LLMs.

Abstract

Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SoTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to reduce trustworthiness significantly. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs. Code and models are available at https://decoding-comp-trust.github.io.

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

TL;DR

Abstract

Paper Structure (20 sections, 25 figures, 4 tables)

This paper contains 20 sections, 25 figures, 4 tables.

Introduction
Related Works
Assessing the Trustworthiness of Compressed LLMs
Revisiting Paths to 7B-sized LLMs: Training Smaller, or Compressing Larger?
From Moderate to High Compression Rates: The (Unexpected) Gains and Losses
Finding the Essential Compression Rates and Induced Gains for Trustworthiness
The Losses on the Extreme Compression Rate
Bag of Tricks for Trustworthy Compression
Conclusion
Additional Related Works
Additional Experimental Results
Detailed Breakdown Results of DecodingTrust Benchamark
AdvGLUE++
Adversarial Demonstration
Out-of-Distribution (OOD)
...and 5 more sections

Figures (25)

Figure 1: Our evaluation aims to assess the trustworthiness of LLMs under compression. Leveraging the trustworthiness evaluation benchmark wang2023decodingtrust, we compare various paths toward efficient small LLMs, including pre-training and different compression algorithms. We uncover the hidden effect of compression on diverse trustworthiness metrics and identify a bag of tricks for efficient and trustworthy LLMs.
Figure 2: Relative score difference w.r.t. 13b models. Every model is compressed at a 50% rate that leads to a similar model size as the 7b model. Darker blue/red colors indicate more foodarkblue improvement/ foodarkred drops w.r.t. to the 13b dense models. Gray dots/lines per cell indicate significantly lower/higher refusal rates (over 10%) which cast biases in the actual opinion/knowledge of a model. Quantization appears to be the most effective solution with minimal loss both on trustworthiness and on benign performance. Scores of dense models are at \ref{['fig:radar_models']}.
Figure 3: The effect of compressing LLAMA2 13b Chat to the low-bit region (lower than 8 as represented in the x-axis) will be less consistent with the dense model but the effect may be positive in some perspectives. Black and red lines indicate the performance of 13b and 7b dense models, respectively. Standard deviations are reported with fewer bits. Grey areas indicate drops over 5 points. Dash lines represent the +/- 5 points w.r.t. the scores of the 13b model.
Figure 4: Evaluation of GPTQ-quantized LLAMA2 13b Chat models in four Ethics scenarios in terms of performance (error rate or FPR) and refusal rate. Facing evasive sentences, the 4-bit quantization can significantly reduce the portion of misclassified immoral actions (i.e., lower FPR). In other scenarios, the 4-bit model reduces the refusal rates a lot w.r.t. high-bit models.
Figure 5: Example in the Ethics Evasive task. The immoral prompt includes an evasive sentence to mislead the LLM, where the 4-bit AWQ model of LLAMA2 13b Chat successfully recognizes the immoral action but the 3-bit cannot.
...and 20 more figures

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

TL;DR

Abstract

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (25)