Mention Attention for Pronoun Translation
Gongbo Tang, Christian Hardmeier
TL;DR
The paper tackles pronoun translation in neural machine translation by addressing referential and gender/number agreement challenges across languages. It introduces a decoder-side mention attention mechanism that selectively attends to source mentions, guided by two mention classifiers trained jointly with translation losses. Empirical results on WMT17 English–German show improved APT scores, particularly for ambiguous pronouns, and a small BLEU gain, with mixed signals from contrastive evaluation. The approach demonstrates a practical pathway to improve pronoun translation without sacrificing overall translation quality, motivating broader cross-language evaluations and refinements in mention tagging.
Abstract
Most pronouns are referring expressions, computers need to resolve what do the pronouns refer to, and there are divergences on pronoun usage across languages. Thus, dealing with these divergences and translating pronouns is a challenge in machine translation. Mentions are referring candidates of pronouns and have closer relations with pronouns compared to general tokens. We assume that extracting additional mention features can help pronoun translation. Therefore, we introduce an additional mention attention module in the decoder to pay extra attention to source mentions but not non-mention tokens. Our mention attention module not only extracts features from source mentions, but also considers target-side context which benefits pronoun translation. In addition, we also introduce two mention classifiers to train models to recognize mentions, whose outputs guide the mention attention. We conduct experiments on the WMT17 English-German translation task, and evaluate our models on general translation and pronoun translation, using BLEU, APT, and contrastive evaluation metrics. Our proposed model outperforms the baseline Transformer model in terms of APT and BLEU scores, this confirms our hypothesis that we can improve pronoun translation by paying additional attention to source mentions, and shows that our introduced additional modules do not have negative effect on the general translation quality.
