Table of Contents
Fetching ...

Fair Text Classification via Transferable Representations

Thibaud Leteno, Michael Perrot, Charlotte Laclau, Antoine Gourru, Christophe Gravier

TL;DR

This work tackles fairness in text classification by introducing Wasserstein Fair Classification (WFC), which minimizes the dependence between task representations and sensitive information using the Wasserstein Dependency Measure. A demonic proxy model predicts sensitive attributes from latent encodings, enabling differentiable regularization even when true sensitive attributes are unavailable, and cross-domain transfer via domain adaptation. Theoretical guarantees link the dependency measure to standard fairness metrics (Demographic Parity and Equality of Opportunity) and bound its relation to the true attribute, while practical experiments on Bios and Moji show competitive accuracy with strong fairness improvements, including successful cross-domain transfer. The approach is flexible to encoder- or decoder-based architectures and supports varying sensitive-attribute settings, making it applicable under privacy and regulatory constraints.

Abstract

Group fairness is a central research topic in text classification, where reaching fair treatment between sensitive groups (e.g., women and men) remains an open challenge. We propose an approach that extends the use of the Wasserstein Dependency Measure for learning unbiased neural text classifiers. Given the challenge of distinguishing fair from unfair information in a text encoder, we draw inspiration from adversarial training by inducing independence between representations learned for the target label and those for a sensitive attribute. We further show that Domain Adaptation can be efficiently leveraged to remove the need for access to the sensitive attributes in the dataset we cure. We provide both theoretical and empirical evidence that our approach is well-founded.

Fair Text Classification via Transferable Representations

TL;DR

This work tackles fairness in text classification by introducing Wasserstein Fair Classification (WFC), which minimizes the dependence between task representations and sensitive information using the Wasserstein Dependency Measure. A demonic proxy model predicts sensitive attributes from latent encodings, enabling differentiable regularization even when true sensitive attributes are unavailable, and cross-domain transfer via domain adaptation. Theoretical guarantees link the dependency measure to standard fairness metrics (Demographic Parity and Equality of Opportunity) and bound its relation to the true attribute, while practical experiments on Bios and Moji show competitive accuracy with strong fairness improvements, including successful cross-domain transfer. The approach is flexible to encoder- or decoder-based architectures and supports varying sensitive-attribute settings, making it applicable under privacy and regulatory constraints.

Abstract

Group fairness is a central research topic in text classification, where reaching fair treatment between sensitive groups (e.g., women and men) remains an open challenge. We propose an approach that extends the use of the Wasserstein Dependency Measure for learning unbiased neural text classifiers. Given the challenge of distinguishing fair from unfair information in a text encoder, we draw inspiration from adversarial training by inducing independence between representations learned for the target label and those for a sensitive attribute. We further show that Domain Adaptation can be efficiently leveraged to remove the need for access to the sensitive attributes in the dataset we cure. We provide both theoretical and empirical evidence that our approach is well-founded.

Paper Structure

This paper contains 46 sections, 7 theorems, 50 equations, 4 figures, 12 tables.

Key Result

Lemma 1

Let $I_W$ be the Wasserstein dependency measure, and $A$, $Y$, $\hat{Y}$ be random variables corresponding to the sensitive attribute, the true label, and the predicted label, respectively. Let $\left\| \cdot\right\|_p$ be the ground metric for the Wasserstein 1-distance. We have that with $|.|$ denoting the absolute value.

Figures (4)

  • Figure 1: Architecture of our method. The top part illustrates the pre-training of the demonic model (red) with domain adaptation. The model is trained to predict the sensitive attribute on the source domain ($A_{\mathcal{S}}$) while minimizing the divergence between the hidden representations from the source and target domains ($Z_{\mathcal{S}}$ and $Z_{\mathcal{T}}$). The bottom part describes the WFC pipeline for a batch of size 4, the demonic model is then frozen. The data representation on the right shows how we enforce dependency or independence between $Z_y$ and $Z_a$. During inference, only the trained classifier (green) is retained to predict $Y$.
  • Figure 2: $I_W(Z_y, Z_a)$ and averaged fairness metrics over classes across training epochs. The values are averaged over 5 runs.
  • Figure 3: Representation of the MLP's layer.
  • Figure 8: Comparison between the use of representations of different MLP layers to compute the Wasserstein.

Theorems & Definitions (7)

  • Lemma 1: Group fairness and Wasserstein Dependency Measure.
  • Lemma 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Lemma 6
  • Lemma 7