Table of Contents
Fetching ...

Fair-GPTQ: Bias-Aware Quantization for Large Language Models

Irina Proskurina, Guillaume Metzler, Julien Velcin

TL;DR

Fair-GPTQ introduces a bias-aware quantization framework that integrates group-fairness constraints into weight quantization for large language models. By pairing stereotypical and anti-stereotypical inputs and deriving a bias term, it reduces gender, race, and religion stereotypes during generation while preserving about 90% of zero-shot accuracy and maintaining 4-bit memory and speed benefits. Empirical results show reduced bias on stereotype benchmarks and competitive debiasing performance against iterative methods across model families and instruction-tuned variants. The approach also yields insights into which weight types and layers contribute most to fairness during quantization, enabling analysis and potential gradient-guided compression in the future. This work supports deploying efficient, fair LLMs in settings with strict memory and latency constraints.

Abstract

High memory demands of generative language models have drawn attention to quantization, which reduces computational cost, memory usage, and latency by mapping model weights to lower-precision integers. Approaches such as GPTQ effectively minimize input-weight product errors during quantization; however, recent empirical studies show that they can increase biased outputs and degrade performance on fairness benchmarks, and it remains unclear which specific weights cause this issue. In this work, we draw new links between quantization and model fairness by adding explicit group-fairness constraints to the quantization objective and introduce Fair-GPTQ, the first quantization method explicitly designed to reduce unfairness in large language models. The added constraints guide the learning of the rounding operation toward less-biased text generation for protected groups. Specifically, we focus on stereotype generation involving occupational bias and discriminatory language spanning gender, race, and religion. Fair-GPTQ has minimal impact on performance, preserving at least 90% of baseline accuracy on zero-shot benchmarks, reduces unfairness relative to a half-precision model, and retains the memory and speed benefits of 4-bit quantization. We also compare the performance of Fair-GPTQ with existing debiasing methods and find that it achieves performance on par with the iterative null-space projection debiasing approach on racial-stereotype benchmarks. Overall, the results validate our theoretical solution to the quantization problem with a group-bias term, highlight its applicability for reducing group bias at quantization time in generative models, and demonstrate that our approach can further be used to analyze channel- and weight-level contributions to fairness during quantization.

Fair-GPTQ: Bias-Aware Quantization for Large Language Models

TL;DR

Fair-GPTQ introduces a bias-aware quantization framework that integrates group-fairness constraints into weight quantization for large language models. By pairing stereotypical and anti-stereotypical inputs and deriving a bias term, it reduces gender, race, and religion stereotypes during generation while preserving about 90% of zero-shot accuracy and maintaining 4-bit memory and speed benefits. Empirical results show reduced bias on stereotype benchmarks and competitive debiasing performance against iterative methods across model families and instruction-tuned variants. The approach also yields insights into which weight types and layers contribute most to fairness during quantization, enabling analysis and potential gradient-guided compression in the future. This work supports deploying efficient, fair LLMs in settings with strict memory and latency constraints.

Abstract

High memory demands of generative language models have drawn attention to quantization, which reduces computational cost, memory usage, and latency by mapping model weights to lower-precision integers. Approaches such as GPTQ effectively minimize input-weight product errors during quantization; however, recent empirical studies show that they can increase biased outputs and degrade performance on fairness benchmarks, and it remains unclear which specific weights cause this issue. In this work, we draw new links between quantization and model fairness by adding explicit group-fairness constraints to the quantization objective and introduce Fair-GPTQ, the first quantization method explicitly designed to reduce unfairness in large language models. The added constraints guide the learning of the rounding operation toward less-biased text generation for protected groups. Specifically, we focus on stereotype generation involving occupational bias and discriminatory language spanning gender, race, and religion. Fair-GPTQ has minimal impact on performance, preserving at least 90% of baseline accuracy on zero-shot benchmarks, reduces unfairness relative to a half-precision model, and retains the memory and speed benefits of 4-bit quantization. We also compare the performance of Fair-GPTQ with existing debiasing methods and find that it achieves performance on par with the iterative null-space projection debiasing approach on racial-stereotype benchmarks. Overall, the results validate our theoretical solution to the quantization problem with a group-bias term, highlight its applicability for reducing group bias at quantization time in generative models, and demonstrate that our approach can further be used to analyze channel- and weight-level contributions to fairness during quantization.

Paper Structure

This paper contains 43 sections, 2 theorems, 25 equations, 4 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

For the optimization problem in Eq. eq:optim-gptq (with row‑wise flattening $\mathbf{w}=\operatorname{vec_r}(\mathbf{W})$), the solution is where $\mathbf{e}_q\in\mathbb{R}^{nd}$ is the $q$‑th standard basis vector and $[\mathbf{H}_\mathbf{w}^{-1}]_{qq}$ is the $q$‑th diagonal element of $\mathbf{H}_\mathbf{w}^{-1}$. The corresponding change of the quadratic objective is

Figures (4)

  • Figure 1: Illustration of the proposed Fair-GPTQ method. We quantize LMs using paired stereotypical ($\mathbf{X}_{0}$) and anti-stereotypical ($\mathbf{X}_{1}$) inputs, selecting weight roundings that reduce differences in model behavior across each pair. This is achieved by an optimization objective with a correction term based on the paired-input difference, $\Delta\mathbf{W} = -2\alpha\, \mathbf{W}\, \Delta\mathbf{X} \Delta\mathbf{X}^{\!\top}\mathbf{H}^{-1}$, where $\Delta\mathbf{X} = \mathbf{X}_0 - \mathbf{X}_1$, producing fairness-aware quantized weights.
  • Figure 2: Accuracy and bias scores across 6 categories for the quantized OPT-6.7B (GPTQ-SS and Fair-GPTQ ) models evaluated on the BBQ dataset, split by context type $a \in \mathcal{A}$.
  • Figure 3: CrowS stereotype scores and perplexity for Fair-GPTQ in the $Lower$setting across different OPT model sizes.
  • Figure 4: Relative weight updates obtained using the proposed Fair-GPTQ method, compared with FP16, for OPT-6.7B and Mistral-7B at the matrix level.

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 2: Weight updates and their impact
  • proof