Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

Bar Iluz; Yanai Elazar; Asaf Yehudai; Gabriel Stanovsky

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

Bar Iluz, Yanai Elazar, Asaf Yehudai, Gabriel Stanovsky

TL;DR

The paper investigates how intrinsic debiasing methods translate to extrinsic bias in machine translation, revealing that downstream fairness depends on design choices such as what to debias, where in the model to apply debiasing, and the target language. It systematically evaluates Hard-Debiasing, INLP, and LEACE across encoder/decoder embeddings and tokenization schemes (all tokens, n-token-profession, and 1-token-profession) in German, Hebrew, and Russian, using WinoMT accuracy and BLEU as core metrics. Key findings show that 1-token-profession debiasing often yields the best gender translation accuracy, the optimal embedding table is method-dependent, and language morphology significantly affects outcomes; LEACE and Hard-Debiasing preserve BLEU better than INLP. The work provides practical guidance for integrating intrinsic debiasing into MT and highlights the need for broader evaluation across tasks and languages to achieve extrinsically fairer systems.

Abstract

Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applications, which is the main motivation for debiasing in the first place. In this work, we systematically test how methods for intrinsic debiasing affect neural machine translation models, by measuring the extrinsic bias of such systems under different design choices. We highlight three challenges and mismatches between the debiasing techniques and their end-goal usage, including the choice of embeddings to debias, the mismatch between words and sub-word tokens debiasing, and the effect on different target languages. We find that these considerations have a significant impact on downstream performance and the success of debiasing.

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

TL;DR

Abstract

Paper Structure (22 sections, 2 figures, 3 tables)

This paper contains 22 sections, 2 figures, 3 tables.

Introduction
Background
Intrinsic debiasing methods.
The effect of debiasing on NMT.
Integrating Intrinsic Debiasing in MT
Which embedding to debias?
Which words to debias?
How does debiasing affect different languages?
Evaluation
Experimental Setup
MT model.
Metrics and datasets.
Results
Debiasing 1-token-profession professions outperforms other approaches.
The optimal embedding table to debias depends on the debiasing method.
...and 7 more sections

Figures (2)

Figure 1: A schematic view of a neural machine translation system, highlighting different possibilities for applying intrinsic debiasing techniques. We examine three considerations: (1) where to apply the debiasing; (2) which tokens to apply the debiasing to (e.g. only gender-indicative words or the entire vocabulary); and (3) the effect of different target languages.
Figure 2: The relation between gender prediction accuracy difference (orange) and the BLEU difference (blue) between the original model (without any intervention) and the debiased model. The left part presents the results with Hard-Debiasing, INLP in the middle, and LEACE on the right. For each method, we present the results per each location (Encoder, Decoder-input, and Decoder-output), as well as each language).

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

TL;DR

Abstract

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)