Table of Contents
Fetching ...

Should We Attend More or Less? Modulating Attention for Fairness

Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Sarath Chandar

TL;DR

Transformer-based NLP models exhibit social biases that hinder deployment. This work proposes entropy-based attention temperature scaling (EAT), a post-training intra-processing method that modulates attention entropy to improve fairness with minimal accuracy loss. Across text classification and generation, EAT improves demographic parity and reduces bias while maintaining performance, outperforming EAR and other baselines and enabling efficient bias mitigation. The approach generalizes across models and biases, offering a practical tool for fair NLP systems.

Abstract

The advances in natural language processing (NLP) pose both opportunities and challenges. While recent progress enables the development of high-performing models for a variety of tasks, it also poses the risk of models learning harmful biases from the data, such as gender stereotypes. In this work, we investigate the role of attention, a widely-used technique in current state-of-the-art NLP models, in the propagation of social biases. Specifically, we study the relationship between the entropy of the attention distribution and the model's performance and fairness. We then propose a novel method for modulating attention weights to improve model fairness after training. Since our method is only applied post-training and pre-inference, it is an intra-processing method and is, therefore, less computationally expensive than existing in-processing and pre-processing approaches. Our results show an increase in fairness and minimal performance loss on different text classification and generation tasks using language models of varying sizes. WARNING: This work uses language that is offensive.

Should We Attend More or Less? Modulating Attention for Fairness

TL;DR

Transformer-based NLP models exhibit social biases that hinder deployment. This work proposes entropy-based attention temperature scaling (EAT), a post-training intra-processing method that modulates attention entropy to improve fairness with minimal accuracy loss. Across text classification and generation, EAT improves demographic parity and reduces bias while maintaining performance, outperforming EAR and other baselines and enabling efficient bias mitigation. The approach generalizes across models and biases, offering a practical tool for fair NLP systems.

Abstract

The advances in natural language processing (NLP) pose both opportunities and challenges. While recent progress enables the development of high-performing models for a variety of tasks, it also poses the risk of models learning harmful biases from the data, such as gender stereotypes. In this work, we investigate the role of attention, a widely-used technique in current state-of-the-art NLP models, in the propagation of social biases. Specifically, we study the relationship between the entropy of the attention distribution and the model's performance and fairness. We then propose a novel method for modulating attention weights to improve model fairness after training. Since our method is only applied post-training and pre-inference, it is an intra-processing method and is, therefore, less computationally expensive than existing in-processing and pre-processing approaches. Our results show an increase in fairness and minimal performance loss on different text classification and generation tasks using language models of varying sizes. WARNING: This work uses language that is offensive.
Paper Structure (29 sections, 7 equations, 9 figures, 6 tables)

This paper contains 29 sections, 7 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: An example showing the effect of varying the temperature scaling factor $\beta$ on the attention map's distribution. Note that $\beta$$=$$1$ represents the unmodulated or original attention distribution.
  • Figure 2: Percentage of change in attention entropy, demographic parity (DP), and AUC of BERT and RoBERTa using different temperature scaling factors ($\beta$) on three datasets, compared to the unmodulated model (i.e.$\beta$$=$$1$). Higher DP values indicate fairer models. The values of $\beta$ that are smaller or larger than $1$ correspond to maximizing or minimizing the attention entropy, respectively. Best viewed in color.
  • Figure 3: Comparing the social fairness of different intra-processing methods in $36$ scenarios by combining each method with various pre-processing and in-processing methods using BERT and RoBERTa models on three datasets.
  • Figure 4: Percentage of change in toxicity on BOLD dataset for different GPT-Neo sizes using EAT for different $\beta$, relative to the unmodulated baseline model with $\beta$$=1$.
  • Figure 5: Perplexity of EAT (solid) and random perturbation (dashed) on Wikitext-2 against $\beta$ using GPT-Neo with $1.3$ and $2.7$ billion parameters.
  • ...and 4 more figures