Table of Contents
Fetching ...

Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach

Daiki Shirafuji, Makoto Takenaka, Shinya Taguchi

TL;DR

This work addresses social biases in language models arising from training data and proposes a Bias Vector method inspired by task arithmetic to mitigate biases without manually labeled debiased data. By continually training base LMs on biased text and constructing a Bias Vector from weight differences, the authors debias pre-trained models via $\theta_{debias} = \theta_{org} - \lambda V_{bias}$, excluding LayerNorm modules. Evaluations on SEAT across BERT, ALBERT, and RoBERTa show an average SEAT improvement of $0.177$ points at $\lambda=1$, with GLUE downstream performance largely preserved (average ~0.23% improvement) under the same setting; however, overly large $\lambda$ can cause knowledge collapse and degrade task performance. The results demonstrate a scalable, data-efficient debiasing approach that reduces bias while maintaining utility, with future work exploring bias interaction, over-debiasing risks, and extension to large language models. This method offers a practical path toward fairer LMs, enabling bias mitigation without curated debiasing datasets and providing a framework for bias control via a tunable scaling factor $\lambda$.

Abstract

The use of language models (LMs) has increased considerably in recent years, and the biases and stereotypes in training data that are reflected in the LM outputs are causing social problems. In this paper, inspired by the task arithmetic, we propose the ``Bias Vector'' method for the mitigation of these LM biases. The Bias Vector method does not require manually created debiasing data. The three main steps of our approach involve: (1) continual training the pre-trained LMs on biased data using masked language modeling; (2) constructing the Bias Vector as the difference between the weights of the biased LMs and those of pre-trained LMs; and (3) subtracting the Bias Vector from the weights of the pre-trained LMs for debiasing. We evaluated the Bias Vector method on the SEAT across three LMs and confirmed an average improvement of 0.177 points. We demonstrated that the Bias Vector method does not degrade the LM performance on downstream tasks in the GLUE benchmark. In addition, we examined the impact of scaling factors, which control the magnitudes of Bias Vectors, with effect sizes on the SEAT and conducted a comprehensive evaluation of our debiased LMs across both the SEAT and GLUE benchmarks.

Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach

TL;DR

This work addresses social biases in language models arising from training data and proposes a Bias Vector method inspired by task arithmetic to mitigate biases without manually labeled debiased data. By continually training base LMs on biased text and constructing a Bias Vector from weight differences, the authors debias pre-trained models via , excluding LayerNorm modules. Evaluations on SEAT across BERT, ALBERT, and RoBERTa show an average SEAT improvement of points at , with GLUE downstream performance largely preserved (average ~0.23% improvement) under the same setting; however, overly large can cause knowledge collapse and degrade task performance. The results demonstrate a scalable, data-efficient debiasing approach that reduces bias while maintaining utility, with future work exploring bias interaction, over-debiasing risks, and extension to large language models. This method offers a practical path toward fairer LMs, enabling bias mitigation without curated debiasing datasets and providing a framework for bias control via a tunable scaling factor .

Abstract

The use of language models (LMs) has increased considerably in recent years, and the biases and stereotypes in training data that are reflected in the LM outputs are causing social problems. In this paper, inspired by the task arithmetic, we propose the ``Bias Vector'' method for the mitigation of these LM biases. The Bias Vector method does not require manually created debiasing data. The three main steps of our approach involve: (1) continual training the pre-trained LMs on biased data using masked language modeling; (2) constructing the Bias Vector as the difference between the weights of the biased LMs and those of pre-trained LMs; and (3) subtracting the Bias Vector from the weights of the pre-trained LMs for debiasing. We evaluated the Bias Vector method on the SEAT across three LMs and confirmed an average improvement of 0.177 points. We demonstrated that the Bias Vector method does not degrade the LM performance on downstream tasks in the GLUE benchmark. In addition, we examined the impact of scaling factors, which control the magnitudes of Bias Vectors, with effect sizes on the SEAT and conducted a comprehensive evaluation of our debiased LMs across both the SEAT and GLUE benchmarks.

Paper Structure

This paper contains 34 sections, 6 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Overview of the Bias Vector method: (1) Training pre-trained LMs on biased data to create the biased models; (2) Subtracting pre-trained LM weights from those of the biased models for constructing the Bias Vectors; (3) Mitigating the Bias Vectors from the pre-trained LM weights for debiasing models.
  • Figure 2: Variation of effect sizes on the SEAT with the scale factor $\lambda$. The dashed lines indicate the effect sizes on pre-trained LMs. The closer the effect size is to zero, the smaller the bias.
  • Figure 3: Effect sizes on the gender-biased SEAT dataset (SEAT-8) with varying $\lambda$. The effect sizes are computed as the average of scores across ten different seed values. The closer the effect size is to zero, the smaller the bias.
  • Figure 4: Effect sizes on the race-biased SEAT dataset (SEAT-5b) with varying $\lambda$. The effect sizes are computed as the average of scores across ten different seed values. The closer the effect size is to zero, the smaller the bias.
  • Figure 5: Effect sizes on gender bias tests in SEAT when varying the value of $\lambda$. The dashed lines indicate effect sizes on pre-trained LMs.
  • ...and 6 more figures