Table of Contents
Fetching ...

On Debiasing Text Embeddings Through Context Injection

Thomas Uriot

TL;DR

It is shown that higher performing models are more prone to capturing biases, but are also better at incorporating context, and Surprisingly, it is found that while models can easily embed affirmative semantics, they fail at embedding neutral semantics.

Abstract

Current advances in Natural Language Processing (NLP) have made it increasingly feasible to build applications leveraging textual data. Generally, the core of these applications rely on having a good semantic representation of text into vectors, via embedding models. However, it has been shown that these embeddings capture and perpetuate biases already present in text. While a few techniques have been proposed to debias embeddings, they do not take advantage of the recent advances in context understanding of modern embedding models. In this paper, we fill this gap by conducting a review of 19 embedding models by quantifying their biases and how well they respond to context injection as a mean of debiasing. We show that higher performing models are more prone to capturing biases, but are also better at incorporating context. Surprisingly, we find that while models can easily embed affirmative semantics, they fail at embedding neutral semantics. Finally, in a retrieval task, we show that biases in embeddings can lead to non-desirable outcomes. We use our new-found insights to design a simple algorithm for top $k$ retrieval, where $k$ is dynamically selected. We show that our algorithm is able to retrieve all relevant gendered and neutral chunks.

On Debiasing Text Embeddings Through Context Injection

TL;DR

It is shown that higher performing models are more prone to capturing biases, but are also better at incorporating context, and Surprisingly, it is found that while models can easily embed affirmative semantics, they fail at embedding neutral semantics.

Abstract

Current advances in Natural Language Processing (NLP) have made it increasingly feasible to build applications leveraging textual data. Generally, the core of these applications rely on having a good semantic representation of text into vectors, via embedding models. However, it has been shown that these embeddings capture and perpetuate biases already present in text. While a few techniques have been proposed to debias embeddings, they do not take advantage of the recent advances in context understanding of modern embedding models. In this paper, we fill this gap by conducting a review of 19 embedding models by quantifying their biases and how well they respond to context injection as a mean of debiasing. We show that higher performing models are more prone to capturing biases, but are also better at incorporating context. Surprisingly, we find that while models can easily embed affirmative semantics, they fail at embedding neutral semantics. Finally, in a retrieval task, we show that biases in embeddings can lead to non-desirable outcomes. We use our new-found insights to design a simple algorithm for top retrieval, where is dynamically selected. We show that our algorithm is able to retrieve all relevant gendered and neutral chunks.

Paper Structure

This paper contains 28 sections, 6 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: : Top - Dashed line shows the averaged AUC scores (see Equation \ref{['eq:auc']}) across the learned concepts. A higher AUC means a better concept representation: we can see that stronger models (higher ranked on the MTEB) have learned a better concept direction ($\rho=0.76$). The solid lines shows the average correlation (a proxy for bias strength) across the three concepts (gender, age, wealth), between human-labeled attributes and their projections onto the concept directions. The blue line shows that stronger models are more prone to containing biases (neutral, $\rho=0.79$). Bottom - The solid lines show the sample proportion $\hat{p}$ of the binomial test. The blue line can be interpreted as a measure of bias present in the embeddings. Models higher ranked on the MTEB are more prone to containing biases ($\rho = 0.77$). In green, values closer to $\hat{p}=0.5$ means that debiasing has been most effective. The red and yellow lines show the ability of an embedding to correctly capture "affirmative" semantics. Several models (yellow line) fails to capture negative semantics.
  • Figure 2: : Similarity matrix between the queries (y-axis) and the chunks (x-axis), using UAE-Large-V1 as embedding model.
  • Figure 3: : Area under the curve achieved for the principal component with the largest AUC, displayed as the x-axis, for the gender concept. The component on the y-axis is used only for plotting purposes and not used in the AUC computation. On the left, the AUC is low, resulting in a poor linear separation of the two classes (yellow and blue). On the other hand, on the right, PC 3 is able to linearly separate the gender terms.
  • Figure 4: : Area under the curve achieved for the principal component with the largest AUC, displayed as the x-axis, for the age concept. The component on the y-axis is used only for plotting purposes and not used in the AUC computation. On the left, the AUC is low, resulting in a poor linear separation of the two classes (yellow and blue). On the other hand, on the right, PC 0 is able to separate the age terms much better.
  • Figure 5: : Results for the gender concept and the occupations attribute for the positive context. The correlation obtained using the projection onto the gender direction is compared to the correlation obtained with $10^4$ random projections to compute a p-value. Note that for Figure (e), the values on the x-axis are more shifted towards the female gender due to the effect of context.
  • ...and 2 more figures