Table of Contents
Fetching ...

MAFIA: Multi-Adapter Fused Inclusive LanguAge Models

Prachi Jain, Ashutosh Sathe, Varun Gumma, Kabir Ahuja, Sunayana Sitaram

TL;DR

MAFIA tackles social biases in pretrained language models by enabling multi-bias debiasing through modular adapters and a fusion mechanism. It combines diverse counterfactual data augmentation with per-bias adapters and an AdapterFusion layer to exploit interactions among gender, race, religion, and profession biases, yielding a model that is fairer and preserves task performance. A central contribution is the useful fairness metric, $\Psi_{\textrm{dim}} = \rho \cdot \alpha (1 - \Delta_{\textrm{dim}})$, used to balance fairness and accuracy across dimensions, and the public release of the multilingual mBias-STS-B benchmark. The results show improvements on STS-B, Bias-STS-B, toxicity classification, and zero-shot cross-lingual transfer.

Abstract

Pretrained Language Models (PLMs) are widely used in NLP for various tasks. Recent studies have identified various biases that such models exhibit and have proposed methods to correct these biases. However, most of the works address a limited set of bias dimensions independently such as gender, race, or religion. Moreover, the methods typically involve finetuning the full model to maintain the performance on the downstream task. In this work, we aim to modularly debias a pretrained language model across multiple dimensions. Previous works extensively explored debiasing PLMs using limited US-centric counterfactual data augmentation (CDA). We use structured knowledge and a large generative model to build a diverse CDA across multiple bias dimensions in a semi-automated way. We highlight how existing debiasing methods do not consider interactions between multiple societal biases and propose a debiasing model that exploits the synergy amongst various societal biases and enables multi-bias debiasing simultaneously. An extensive evaluation on multiple tasks and languages demonstrates the efficacy of our approach.

MAFIA: Multi-Adapter Fused Inclusive LanguAge Models

TL;DR

MAFIA tackles social biases in pretrained language models by enabling multi-bias debiasing through modular adapters and a fusion mechanism. It combines diverse counterfactual data augmentation with per-bias adapters and an AdapterFusion layer to exploit interactions among gender, race, religion, and profession biases, yielding a model that is fairer and preserves task performance. A central contribution is the useful fairness metric, , used to balance fairness and accuracy across dimensions, and the public release of the multilingual mBias-STS-B benchmark. The results show improvements on STS-B, Bias-STS-B, toxicity classification, and zero-shot cross-lingual transfer.

Abstract

Pretrained Language Models (PLMs) are widely used in NLP for various tasks. Recent studies have identified various biases that such models exhibit and have proposed methods to correct these biases. However, most of the works address a limited set of bias dimensions independently such as gender, race, or religion. Moreover, the methods typically involve finetuning the full model to maintain the performance on the downstream task. In this work, we aim to modularly debias a pretrained language model across multiple dimensions. Previous works extensively explored debiasing PLMs using limited US-centric counterfactual data augmentation (CDA). We use structured knowledge and a large generative model to build a diverse CDA across multiple bias dimensions in a semi-automated way. We highlight how existing debiasing methods do not consider interactions between multiple societal biases and propose a debiasing model that exploits the synergy amongst various societal biases and enables multi-bias debiasing simultaneously. An extensive evaluation on multiple tasks and languages demonstrates the efficacy of our approach.
Paper Structure (30 sections, 1 equation, 3 figures, 12 tables)

This paper contains 30 sections, 1 equation, 3 figures, 12 tables.

Figures (3)

  • Figure 1: Steps to generate Counterfactual (CF) pairs for racial bias. Note that the technique can be similarly used for other biases.
  • Figure 2: A comprehensive summary of the various training strategies described. Only the components highlighted in green are finetuned in each case.
  • Figure 3: Score distributions on STS-B obtained from various models. The middle 3 plots correspond to $\textrm{iDeb}_{\scriptsize{\textrm{bias}}}$ baselines. All $\textrm{iDeb}_{\scriptsize{\textrm{bias}}}$ models output a significantly narrower score distribution which can easily lead to better scores on Bias-STS-B but can decrease the performance on STS-B.