Table of Contents
Fetching ...

Effective Controllable Bias Mitigation for Classification and Retrieval using Gate Adapters

Shahed Masoudian, Cornelia Volaucnik, Markus Schedl, Navid Rekabsaz

TL;DR

The paper tackles the problem of debiasing language models with controllable intensity at inference time. It introduces ConGater, a modular gate adapter that uses a trajectory-based activation, $\text{t-sigmoid}_{\omega}$, to gradually erase protected-attribute information, parameterized by $\omega \in [0,1]$. Trained with a combination of a task loss $\mathcal{L}_{task}$ and an attribute-loss $\mathcal{L}_{\rho_i}$, ConGater supports parallel or post-hoc training and enables continuous navigation between biased and debiased outputs via $\omega$ at inference. Across classification and information retrieval, ConGater consistently reduces attribute leakage while preserving or even improving task performance relative to baselines, and offers interpretable, user-controlled fairness–performance trade-offs, including a notable improvement in IR neutrality with minimal degradation. This approach advances practical debiasing by delivering a single, adaptable model that can tailor bias reduction to user needs and application contexts.

Abstract

Bias mitigation of Language Models has been the topic of many studies with a recent focus on learning separate modules like adapters for on-demand debiasing. Besides optimizing for a modularized debiased model, it is often critical in practice to control the degree of bias reduction at inference time, e.g., in order to tune for a desired performance-fairness trade-off in search results or to control the strength of debiasing in classification tasks. In this paper, we introduce Controllable Gate Adapter (ConGater), a novel modular gating mechanism with adjustable sensitivity parameters, which allows for a gradual transition from the biased state of the model to the fully debiased version at inference time. We demonstrate ConGater performance by (1) conducting adversarial debiasing experiments with three different models on three classification tasks with four protected attributes, and (2) reducing the bias of search results through fairness list-wise regularization to enable adjusting a trade-off between performance and fairness metrics. Our experiments on the classification tasks show that compared to baselines of the same caliber, ConGater can maintain higher task performance while containing less information regarding the attributes. Our results on the retrieval task show that the fully debiased ConGater can achieve the same fairness performance while maintaining more than twice as high task performance than recent strong baselines. Overall, besides strong performance ConGater enables the continuous transitioning between biased and debiased states of models, enhancing personalization of use and interpretability through controllability.

Effective Controllable Bias Mitigation for Classification and Retrieval using Gate Adapters

TL;DR

The paper tackles the problem of debiasing language models with controllable intensity at inference time. It introduces ConGater, a modular gate adapter that uses a trajectory-based activation, , to gradually erase protected-attribute information, parameterized by . Trained with a combination of a task loss and an attribute-loss , ConGater supports parallel or post-hoc training and enables continuous navigation between biased and debiased outputs via at inference. Across classification and information retrieval, ConGater consistently reduces attribute leakage while preserving or even improving task performance relative to baselines, and offers interpretable, user-controlled fairness–performance trade-offs, including a notable improvement in IR neutrality with minimal degradation. This approach advances practical debiasing by delivering a single, adaptable model that can tailor bias reduction to user needs and application contexts.

Abstract

Bias mitigation of Language Models has been the topic of many studies with a recent focus on learning separate modules like adapters for on-demand debiasing. Besides optimizing for a modularized debiased model, it is often critical in practice to control the degree of bias reduction at inference time, e.g., in order to tune for a desired performance-fairness trade-off in search results or to control the strength of debiasing in classification tasks. In this paper, we introduce Controllable Gate Adapter (ConGater), a novel modular gating mechanism with adjustable sensitivity parameters, which allows for a gradual transition from the biased state of the model to the fully debiased version at inference time. We demonstrate ConGater performance by (1) conducting adversarial debiasing experiments with three different models on three classification tasks with four protected attributes, and (2) reducing the bias of search results through fairness list-wise regularization to enable adjusting a trade-off between performance and fairness metrics. Our experiments on the classification tasks show that compared to baselines of the same caliber, ConGater can maintain higher task performance while containing less information regarding the attributes. Our results on the retrieval task show that the fully debiased ConGater can achieve the same fairness performance while maintaining more than twice as high task performance than recent strong baselines. Overall, besides strong performance ConGater enables the continuous transitioning between biased and debiased states of models, enhancing personalization of use and interpretability through controllability.
Paper Structure (24 sections, 12 equations, 19 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 12 equations, 19 figures, 5 tables, 1 algorithm.

Figures (19)

  • Figure 1: (a) The overall architecture of ConGater as an adjustable self-gate adapter network. (b) Effect of $\omega$ parameter on t-sigmoid. Increasing $\omega$ results in a transition from the constant function $y=1$ (open gate) to the sigmoid function (full functional gate).
  • Figure 2: Results of the ConGater models using BERT-Base when increasing the gating sensitivity $\omega$ from 0 (no effect) to 1 (full effect). Each trained model is evaluated multiple times on the various $\omega$ values adjusted at inference time. The left/right y-axis corresponds to the task performance and attribute probing results, respectively. The results show the continuous reduction in the information presence of the target concept, as $\omega$ increases.
  • Figure 3: Prediction probabilities of ConGater when gradually increasing $\omega$, for a female physician's biography, incorrectly classified as a nurse in the initial state. The figure illustrates how changing the strength of gender removal affects the model's decision, providing a higher degree of interpretability through controllability.
  • Figure 4: Fairness-performance trade-off between Ft, Adp, and ConGater. For baselines, each point refer to a new model training with color intensities indicating the degree of the regularization coefficient $\lambda$. ConGater is trained only once, and each point indicates the evaluation according to an $\omega$ value.
  • Figure 5: The simple proposed ConGater for multi-attributes. The fusion gate is defined as the element-wise multiplication of the individual gates
  • ...and 14 more figures