Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models

Yang Liu

Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models

Yang Liu

TL;DR

This work addresses robustness gaps in evaluating social biases of masked language models when data is limited. By representing stereotypical and anti-stereotypical PLL score sets as Gaussian distributions, the authors develop two information-theoretic measures, $KLS$ based on KL divergence and $JSS$ based on JS divergence, to compare these distributions across bias types. The methods are extended with a dispersion term $\Delta\sigma$ and bias-type weighting, and validated on StereoSet and CrowS-Pairs against established PLL-based baselines. Results show that the distribution-based measures are more robust and interpretable, particularly under smaller datasets, improving beyond indicator-function approaches and offering a clearer view of model biases with practical implications for bias mitigation.

Abstract

Many evaluation measures are used to evaluate social biases in masked language models (MLMs). However, we find that these previously proposed evaluation measures are lacking robustness in scenarios with limited datasets. This is because these measures are obtained by comparing the pseudo-log-likelihood (PLL) scores of the stereotypical and anti-stereotypical samples using an indicator function. The disadvantage is the limited mining of the PLL score sets without capturing its distributional information. In this paper, we represent a PLL score set as a Gaussian distribution and use Kullback Leibler (KL) divergence and Jensen Shannon (JS) divergence to construct evaluation measures for the distributions of stereotypical and anti-stereotypical PLL scores. Experimental results on the publicly available datasets StereoSet (SS) and CrowS-Pairs (CP) show that our proposed measures are significantly more robust and interpretable than those proposed previously.

Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models

TL;DR

based on KL divergence and

based on JS divergence, to compare these distributions across bias types. The methods are extended with a dispersion term

and bias-type weighting, and validated on StereoSet and CrowS-Pairs against established PLL-based baselines. Results show that the distribution-based measures are more robust and interpretable, particularly under smaller datasets, improving beyond indicator-function approaches and offering a clearer view of model biases with practical implications for bias mitigation.

Abstract

Paper Structure (15 sections, 11 equations, 5 figures, 5 tables)

This paper contains 15 sections, 11 equations, 5 figures, 5 tables.

Introduction
Methodology
Motivation
Proposed Evaluation Measure
Experiment
Setting
Baseline
Evaluation Result
Overall Bias Analysis
Specific Bias Types Analysis
Correlation Analysis
Robustness Study
PLL Score Analysis
Related Work
Conclusion

Figures (5)

Figure 1: Processes for evaluating social biases in MLMs.
Figure 2: Kernel density estimations and Gaussian distributions of PLL scores for BERT, RoBERTa, and ALBERT on the CP dataset.
Figure 3: A simple example from the distribution of PLL scores to KLS and JSS.
Figure 4: Pearson correlations between evaluation measures.
Figure 5: Experimental results of evaluation measures on different sampling rates on the CP dataset. Red circles indicate the occurrence of non-robustness.

Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models

TL;DR

Abstract

Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)