Table of Contents
Fetching ...

Advancing Fairness in Natural Language Processing: From Traditional Methods to Explainability

Fanny Jourdan

TL;DR

This thesis introduces an innovative algorithm to mitigate biases in multi-class classifiers, tailored for high-risk NLP applications, surpassing traditional methods in both bias mitigation and prediction accuracy and introduces TaCo, a novel method to neutralize bias in Transformer model embeddings.

Abstract

The burgeoning field of Natural Language Processing (NLP) stands at a critical juncture where the integration of fairness within its frameworks has become an imperative. This PhD thesis addresses the need for equity and transparency in NLP systems, recognizing that fairness in NLP is not merely a technical challenge but a moral and ethical necessity, requiring a rigorous examination of how these technologies interact with and impact diverse human populations. Through this lens, this thesis undertakes a thorough investigation into the development of equitable NLP methodologies and the evaluation of biases that prevail in current systems. First, it introduces an innovative algorithm to mitigate biases in multi-class classifiers, tailored for high-risk NLP applications, surpassing traditional methods in both bias mitigation and prediction accuracy. Then, an analysis of the Bios dataset reveals the impact of dataset size on discriminatory biases and the limitations of standard fairness metrics. This awareness has led to explorations in the field of explainable AI, aiming for a more complete understanding of biases where traditional metrics are limited. Consequently, the thesis presents COCKATIEL, a model-agnostic explainability method that identifies and ranks concepts in Transformer models, outperforming previous approaches in sentiment analysis tasks. Finally, the thesis contributes to bridging the gap between fairness and explainability by introducing TaCo, a novel method to neutralize bias in Transformer model embeddings. In conclusion, this thesis constitutes a significant interdisciplinary endeavor that intertwines explicability and fairness to challenge and reshape current NLP paradigms. The methodologies and critiques presented contribute to the ongoing discourse on fairness in machine learning, offering actionable solutions for more equitable and responsible AI systems.

Advancing Fairness in Natural Language Processing: From Traditional Methods to Explainability

TL;DR

This thesis introduces an innovative algorithm to mitigate biases in multi-class classifiers, tailored for high-risk NLP applications, surpassing traditional methods in both bias mitigation and prediction accuracy and introduces TaCo, a novel method to neutralize bias in Transformer model embeddings.

Abstract

The burgeoning field of Natural Language Processing (NLP) stands at a critical juncture where the integration of fairness within its frameworks has become an imperative. This PhD thesis addresses the need for equity and transparency in NLP systems, recognizing that fairness in NLP is not merely a technical challenge but a moral and ethical necessity, requiring a rigorous examination of how these technologies interact with and impact diverse human populations. Through this lens, this thesis undertakes a thorough investigation into the development of equitable NLP methodologies and the evaluation of biases that prevail in current systems. First, it introduces an innovative algorithm to mitigate biases in multi-class classifiers, tailored for high-risk NLP applications, surpassing traditional methods in both bias mitigation and prediction accuracy. Then, an analysis of the Bios dataset reveals the impact of dataset size on discriminatory biases and the limitations of standard fairness metrics. This awareness has led to explorations in the field of explainable AI, aiming for a more complete understanding of biases where traditional metrics are limited. Consequently, the thesis presents COCKATIEL, a model-agnostic explainability method that identifies and ranks concepts in Transformer models, outperforming previous approaches in sentiment analysis tasks. Finally, the thesis contributes to bridging the gap between fairness and explainability by introducing TaCo, a novel method to neutralize bias in Transformer model embeddings. In conclusion, this thesis constitutes a significant interdisciplinary endeavor that intertwines explicability and fairness to challenge and reshape current NLP paradigms. The methodologies and critiques presented contribute to the ongoing discourse on fairness in machine learning, offering actionable solutions for more equitable and responsible AI systems.

Paper Structure

This paper contains 160 sections, 29 equations, 41 figures, 1 table, 1 algorithm.

Figures (41)

  • Figure 1: Number of biographies for each occupation by gender on the total Bios dataset de2019bias.
  • Figure 2: Co-occurrence of occupation and gender for Bios dataset de2019bias. Solely predicting the gender from the occupation yields a minimum of 62% accuracy, which is the minimum baseline for an accurate but unbiased model. Any classifier with lower accuracy on gender prediction must have accuracy on occupation prediction lower than $100\%$.
  • Figure 3: Number of reviews for each label (positive or negative) for IMDB dataset (on the left) and for BEER dataset (on the right).
  • Figure 4: Example of a positive review on BEER review with humans annotations.
  • Figure 5: Pipelines followed to define the three classifiers compared in Section \ref{['sec:results']}.
  • ...and 36 more figures

Theorems & Definitions (2)

  • Definition 7.3.1: Sobol indices
  • Definition 7.3.2: Total Sobol indices