Table of Contents
Fetching ...

Analyzing Correlations Between Intrinsic and Extrinsic Bias Metrics of Static Word Embeddings With Their Measuring Biases Aligned

Taisei Katô, Yusuke Miyao

TL;DR

This work reexamines the link between intrinsic bias metrics for static word embeddings (WEAT, RNSB) and extrinsic bias observed in downstream NLP tasks. By extracting bias-representing word sets from extrinsic datasets (e.g., WinoBias, hate speech detection) and applying intrinsic metrics to these aligned word sets, the authors test whether intrinsic measures can predict biased behavior. Across 45 embeddings with varied bias levels, they find strong correlations when extrinsic biases are properly matched (notably WEAT with WinoBias) but weak or inconsistent correlations for other metrics (notably HSD), highlighting the importance of bias alignment. The results suggest intrinsic metrics can be informative in specific contexts but should not replace extrinsic bias evaluation, particularly for biases with complex societal implications. The study also demonstrates that making intrinsic and extrinsic biases measure the same underlying bias is crucial for meaningful correlation analysis.

Abstract

We examine the abilities of intrinsic bias metrics of static word embeddings to predict whether Natural Language Processing (NLP) systems exhibit biased behavior. A word embedding is one of the fundamental NLP technologies that represents the meanings of words through real vectors, and problematically, it also learns social biases such as stereotypes. An intrinsic bias metric measures bias by examining a characteristic of vectors, while an extrinsic bias metric checks whether an NLP system trained with a word embedding is biased. A previous study found that a common intrinsic bias metric usually does not correlate with extrinsic bias metrics. However, the intrinsic and extrinsic bias metrics did not measure the same bias in most cases, which makes us question whether the lack of correlation is genuine. In this paper, we extract characteristic words from datasets of extrinsic bias metrics and analyze correlations with intrinsic bias metrics with those words to ensure both metrics measure the same bias. We observed moderate to high correlations with some extrinsic bias metrics but little to no correlations with the others. This result suggests that intrinsic bias metrics can predict biased behavior in particular settings but not in others. Experiment codes are available at GitHub.

Analyzing Correlations Between Intrinsic and Extrinsic Bias Metrics of Static Word Embeddings With Their Measuring Biases Aligned

TL;DR

This work reexamines the link between intrinsic bias metrics for static word embeddings (WEAT, RNSB) and extrinsic bias observed in downstream NLP tasks. By extracting bias-representing word sets from extrinsic datasets (e.g., WinoBias, hate speech detection) and applying intrinsic metrics to these aligned word sets, the authors test whether intrinsic measures can predict biased behavior. Across 45 embeddings with varied bias levels, they find strong correlations when extrinsic biases are properly matched (notably WEAT with WinoBias) but weak or inconsistent correlations for other metrics (notably HSD), highlighting the importance of bias alignment. The results suggest intrinsic metrics can be informative in specific contexts but should not replace extrinsic bias evaluation, particularly for biases with complex societal implications. The study also demonstrates that making intrinsic and extrinsic biases measure the same underlying bias is crucial for meaningful correlation analysis.

Abstract

We examine the abilities of intrinsic bias metrics of static word embeddings to predict whether Natural Language Processing (NLP) systems exhibit biased behavior. A word embedding is one of the fundamental NLP technologies that represents the meanings of words through real vectors, and problematically, it also learns social biases such as stereotypes. An intrinsic bias metric measures bias by examining a characteristic of vectors, while an extrinsic bias metric checks whether an NLP system trained with a word embedding is biased. A previous study found that a common intrinsic bias metric usually does not correlate with extrinsic bias metrics. However, the intrinsic and extrinsic bias metrics did not measure the same bias in most cases, which makes us question whether the lack of correlation is genuine. In this paper, we extract characteristic words from datasets of extrinsic bias metrics and analyze correlations with intrinsic bias metrics with those words to ensure both metrics measure the same bias. We observed moderate to high correlations with some extrinsic bias metrics but little to no correlations with the others. This result suggests that intrinsic bias metrics can predict biased behavior in particular settings but not in others. Experiment codes are available at GitHub.
Paper Structure (47 sections, 6 equations, 1 figure, 20 tables)

This paper contains 47 sections, 6 equations, 1 figure, 20 tables.

Figures (1)

  • Figure 1: Correlations between intrinsic bias metrics (x-axis) and extrinsic bias metrics (y-axis). The intrinsic bias metrics are measured with the corresponding extracted word sets. The word embedding algorithm used is word2vec. For original (not bias-modified) word embeddings, we train NLP models ten times so that we can calculate standard deviations of extrinsic bias metrics, which are shown as error bars.