Table of Contents
Fetching ...

How Quantization Shapes Bias in Large Language Models

Federico Marcuzzi, Xuefei Ning, Roy Schwartz, Iryna Gurevych

TL;DR

This paper provides a comprehensive, large-scale analysis of how post-training quantization (PTQ) of weights and activations affects social bias in large language models. It evaluates three PTQ strategies—Activation-aware Weight Quantization (AWQ), Generalized Post-Training Quantization (GPTQ), and SmoothQuant (SQ)—across multiple architectures (LLaMA- and Qwen-based) and reasoning capabilities, using 13 benchmarks that cover stereotypes, fairness, toxicity, and sentiment for gender, race, and religion. The study finds that quantization can reduce toxicity and modestly affect sentiment but tends to amplify stereotypes and unfairness in generative tasks under aggressive compression, with results largely consistent across demographics and model types; reasoning models generally exhibit lower bias levels. These findings highlight the need to balance efficiency gains from quantization with ethical considerations, and to perform fine-grained bias evaluations when deploying quantized LLMs in real-world settings.

Abstract

This work presents a comprehensive evaluation of how quantization affects model bias, with particular attention to its impact on individual demographic subgroups. We focus on weight and activation quantization strategies and examine their effects across a broad range of bias types, including stereotypes, fairness, toxicity, and sentiment. We employ both probability- and generated text-based metrics across 13 benchmarks and evaluate models that differ in architecture family and reasoning ability. Our findings show that quantization has a nuanced impact on bias: while it can reduce model toxicity and does not significantly impact sentiment, it tends to slightly increase stereotypes and unfairness in generative tasks, especially under aggressive compression. These trends are generally consistent across demographic categories and subgroups, and model types, although their magnitude depends on the specific setting. Overall, our results highlight the importance of carefully balancing efficiency and ethical considerations when applying quantization in practice.

How Quantization Shapes Bias in Large Language Models

TL;DR

This paper provides a comprehensive, large-scale analysis of how post-training quantization (PTQ) of weights and activations affects social bias in large language models. It evaluates three PTQ strategies—Activation-aware Weight Quantization (AWQ), Generalized Post-Training Quantization (GPTQ), and SmoothQuant (SQ)—across multiple architectures (LLaMA- and Qwen-based) and reasoning capabilities, using 13 benchmarks that cover stereotypes, fairness, toxicity, and sentiment for gender, race, and religion. The study finds that quantization can reduce toxicity and modestly affect sentiment but tends to amplify stereotypes and unfairness in generative tasks under aggressive compression, with results largely consistent across demographics and model types; reasoning models generally exhibit lower bias levels. These findings highlight the need to balance efficiency gains from quantization with ethical considerations, and to perform fine-grained bias evaluations when deploying quantized LLMs in real-world settings.

Abstract

This work presents a comprehensive evaluation of how quantization affects model bias, with particular attention to its impact on individual demographic subgroups. We focus on weight and activation quantization strategies and examine their effects across a broad range of bias types, including stereotypes, fairness, toxicity, and sentiment. We employ both probability- and generated text-based metrics across 13 benchmarks and evaluate models that differ in architecture family and reasoning ability. Our findings show that quantization has a nuanced impact on bias: while it can reduce model toxicity and does not significantly impact sentiment, it tends to slightly increase stereotypes and unfairness in generative tasks, especially under aggressive compression. These trends are generally consistent across demographic categories and subgroups, and model types, although their magnitude depends on the specific setting. Overall, our results highlight the importance of carefully balancing efficiency and ethical considerations when applying quantization in practice.

Paper Structure

This paper contains 52 sections, 4 equations, 36 figures, 22 tables.

Figures (36)

  • Figure 1: Historical Bias on WinoBias. Closer to 0 is better ($\to\!0$). The ${}^{*}$ denotes significant differences.
  • Figure 2: Quantization impact on DiscrimEvalGen. Left: $\uparrow$, right: $\downarrow$). The ${}^{*}$ denotes significant differences.
  • Figure 3: StereotypeScore per category on StereoSet. Closer to 50 is better ($\to\!50$).
  • Figure 4: Toxicity per category on BOLD ($\downarrow$).
  • Figure D.1: Analysis of the correlation between toxicity, quantization, and average generation length on BOLD. Multiple gray points represent generations from the original model under different constraints on the maximum number of output tokens.
  • ...and 31 more figures