The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing

Masahiro Kaneko; Danushka Bollegala; Timothy Baldwin

The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing

Masahiro Kaneko, Danushka Bollegala, Timothy Baldwin

TL;DR

The paper investigates how bias evaluations and debiasing behave differently under fine-tuning versus in-context learning. By systematically comparing intrinsic and extrinsic bias scores across pre-training and downstream tasks, and by applying FT-based (CDA, ALT) and ICL-based (ZSD, FSD) debiasing across eight LaMini PLMs, it demonstrates that ICL yields higher correlations between pre-training and downstream bias assessments and incurs less performance loss. FT-based debiasing causes larger shifts in model outputs and greater degradation on downstream tasks, partly because it updates parameters more extensively. The findings suggest that caution is needed when extrapolating FT-based bias trends to ICL settings and advocate considering ICL as a viable debiasing approach that better preserves prior knowledge and downstream utility.

Abstract

The output tendencies of Pre-trained Language Models (PLM) vary markedly before and after Fine-Tuning (FT) due to the updates to the model parameters. These divergences in output tendencies result in a gap in the social biases of PLMs. For example, there exits a low correlation between intrinsic bias scores of a PLM and its extrinsic bias scores under FT-based debiasing methods. Additionally, applying FT-based debiasing methods to a PLM leads to a decline in performance in downstream tasks. On the other hand, PLMs trained on large datasets can learn without parameter updates via In-Context Learning (ICL) using prompts. ICL induces smaller changes to PLMs compared to FT-based debiasing methods. Therefore, we hypothesize that the gap observed in pre-trained and FT models does not hold true for debiasing methods that use ICL. In this study, we demonstrate that ICL-based debiasing methods show a higher correlation between intrinsic and extrinsic bias scores compared to FT-based methods. Moreover, the performance degradation due to debiasing is also lower in the ICL case compared to that in the FT case.

The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing

TL;DR

Abstract

Paper Structure (14 sections, 2 figures, 2 tables)

This paper contains 14 sections, 2 figures, 2 tables.

Introduction
Experiments
Bias Evaluations
Pre-training settings.
Downstream settings.
Debiasing Methods
Fine-tuning.
In-context learning.
Downstream Task Evaluations
Pre-trained Language Models
Correlation between Bias Evaluations in Pre-training and Downstream Tasks
Impact of Debiasing via Fine-tuning vs. ICL in Downstream Task Performance
Changing of Parameters in PLMs
Conclusion

Figures (2)

Figure 1: The gap in bias scores when evaluating and debiasing PLMs using FT- and ICL-based methods. A lower correlation between intrinsic and extrinsic bias scores (a), while a larger drop in downstream task performance (b) is encountered with FT compared to ICL.
Figure 2: Performance diffirence between original and debiased PLMs in RACE, ANLI, and WB tasks are shown. Here, PLMs are debiased using fine-tuning- (CDA, ATL) and ICL-based methods.

The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing

TL;DR

Abstract

The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing

Authors

TL;DR

Abstract

Table of Contents

Figures (2)