Table of Contents
Fetching ...

First the worst: Finding better gender translations during beam search

Danielle Saunders, Rosie Sallis, Bill Byrne

TL;DR

The paper tackles gender mistranslations in neural MT arising from beam search biases by proposing inference-time remedies that do not modify training data. It develops two complementary strategies: gender-constrained decoding to generate gender-aware n-best lists, and gender-consistency-based reranking (with both oracle and inferred entity information) to select better translations. The results show meaningful gains in WinoMT accuracy across English-German, English-Spanish, and English-Hebrew, with pronounced improvements at larger beams and robustness when using automatic coreference for inference. The approach is efficient (single-model, no retraining) and adaptable, including support for named entities and new gendered vocabulary through placeholder-based reranking. Overall, this work provides a practical, data-light path to reducing gender bias in MT during inference.

Abstract

Neural machine translation inference procedures like beam search generate the most likely output under the model. This can exacerbate any demographic biases exhibited by the model. We focus on gender bias resulting from systematic errors in grammatical gender translation, which can lead to human referents being misrepresented or misgendered. Most approaches to this problem adjust the training data or the model. By contrast, we experiment with simply adjusting the inference procedure. We experiment with reranking nbest lists using gender features obtained automatically from the source sentence, and applying gender constraints while decoding to improve nbest list gender diversity. We find that a combination of these techniques allows large gains in WinoMT accuracy without requiring additional bilingual data or an additional NMT model.

First the worst: Finding better gender translations during beam search

TL;DR

The paper tackles gender mistranslations in neural MT arising from beam search biases by proposing inference-time remedies that do not modify training data. It develops two complementary strategies: gender-constrained decoding to generate gender-aware n-best lists, and gender-consistency-based reranking (with both oracle and inferred entity information) to select better translations. The results show meaningful gains in WinoMT accuracy across English-German, English-Spanish, and English-Hebrew, with pronounced improvements at larger beams and robustness when using automatic coreference for inference. The approach is efficient (single-model, no retraining) and adaptable, including support for named entities and new gendered vocabulary through placeholder-based reranking. Overall, this work provides a practical, data-light path to reducing gender bias in MT during inference.

Abstract

Neural machine translation inference procedures like beam search generate the most likely output under the model. This can exacerbate any demographic biases exhibited by the model. We focus on gender bias resulting from systematic errors in grammatical gender translation, which can lead to human referents being misrepresented or misgendered. Most approaches to this problem adjust the training data or the model. By contrast, we experiment with simply adjusting the inference procedure. We experiment with reranking nbest lists using gender features obtained automatically from the source sentence, and applying gender constraints while decoding to improve nbest list gender diversity. We find that a combination of these techniques allows large gains in WinoMT accuracy without requiring additional bilingual data or an additional NMT model.

Paper Structure

This paper contains 15 sections, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Constraints for a toy initial hypothesis.
  • Figure 2: Complete workflow for a toy en-es example. We have two options for producing an n-best list - standard or gender-constrained search - and can then either take the highest likelihood output from the list, or rerank it.
  • Figure 3: WinoMT accuracy after oracle-reranking gender-constrained n-best lists, varying n.