Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

Panatchakorn Anantaprayoon; Masahiro Kaneko; Naoaki Okazaki

Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

Panatchakorn Anantaprayoon, Masahiro Kaneko, Naoaki Okazaki

TL;DR

This work tackles the limitation of single-label bias evaluation in NLI by introducing NLI-CoAL, a framework that leverages all three NLI labels (entailment, contradiction, neutral) to assess gender bias. It defines three data groups (PS, AS, NS) and a corresponding bias score $s = \frac{e_p + c_a + (1 - n_n)}{3}$, validated against a baseline that uses only neutral outputs. The authors construct multilingual evaluation datasets in English, Japanese, and Chinese, and perform a meta-evaluation showing that NLI-CoAL more accurately distinguishes biased inferences from non-biased errors. Experimental results across English, Japanese, and Chinese PLMs reveal language-specific bias patterns and demonstrate the method’s cross-language compatibility, highlighting potential gaps in Chinese NLI learning. Overall, NLI-CoAL provides a more nuanced, task-specific, and language-backed approach to measuring bias in NLI models, with practical implications for fairer NLP systems.

Abstract

Discriminatory gender biases have been found in Pre-trained Language Models (PLMs) for multiple languages. In Natural Language Inference (NLI), existing bias evaluation methods have focused on the prediction results of one specific label out of three labels, such as neutral. However, such evaluation methods can be inaccurate since unique biased inferences are associated with unique prediction labels. Addressing this limitation, we propose a bias evaluation method for PLMs, called NLI-CoAL, which considers all the three labels of NLI task. First, we create three evaluation data groups that represent different types of biases. Then, we define a bias measure based on the corresponding label output of each data group. In the experiments, we introduce a meta-evaluation technique for NLI bias measures and use it to confirm that our bias measure can distinguish biased, incorrect inferences from non-biased incorrect inferences better than the baseline, resulting in a more accurate bias evaluation. We create the datasets in English, Japanese, and Chinese, and successfully validate the compatibility of our bias measure across multiple languages. Lastly, we observe the bias tendencies in PLMs of different languages. To our knowledge, we are the first to construct evaluation datasets and measure PLMs' bias from NLI in Japanese and Chinese.

Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

TL;DR

, validated against a baseline that uses only neutral outputs. The authors construct multilingual evaluation datasets in English, Japanese, and Chinese, and perform a meta-evaluation showing that NLI-CoAL more accurately distinguishes biased inferences from non-biased errors. Experimental results across English, Japanese, and Chinese PLMs reveal language-specific bias patterns and demonstrate the method’s cross-language compatibility, highlighting potential gaps in Chinese NLI learning. Overall, NLI-CoAL provides a more nuanced, task-specific, and language-backed approach to measuring bias in NLI models, with practical implications for fairer NLP systems.

Abstract

Paper Structure (30 sections, 7 equations, 4 figures, 5 tables)

This paper contains 30 sections, 7 equations, 4 figures, 5 tables.

Introduction
Proposed Bias Evaluation Method: NLI-CoAL
Three Types of Gender Bias Evaluation Data
Pro-Stereotypical (PS).
Anti-Stereotypical (AS).
Non-Stereotypical (NS).
Bias Evaluation Measure
Baseline Bias Measure.
NLI-CoAL Bias Measure.
Bias Evaluation Datasets Creation
Meta-evaluation of NLI Bias Measures by Bias-Controlling
Bias-Controlling
Correlation Between Bias Rates and Bias Scores
Experiments
Meta-evaluation of Bias Evaluation Methods
...and 15 more sections

Figures (4)

Figure 1: Comparison of bias evaluation methods. While the existing method by Dev_2020 considers all incorrect outputs as bias, our proposed method (NLI-CoAL) considers only biased and incorrect outputs
Figure 2: Evaluation datasets creation
Figure 3: Summary of the meta-evaluation method for NLI bias measures
Figure 4: Plots between bias rate and score in PLMs from three different languages

Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

TL;DR

Abstract

Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

Authors

TL;DR

Abstract

Table of Contents

Figures (4)