Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models
Paweł Mąka, Yusuf Can Semerci, Jan Scholtes, Gerasimos Spanakis
TL;DR
The paper tackles pronoun disambiguation in context-aware MT by dissecting how Transformer attention heads use contextual cues. It introduces a three-part methodology—measuring head-level attention to pronoun-relations, correlating those scores with disambiguation accuracy, and perturbing heads to test causal impact—applied to ContraPro and LCPT on English→German and English→French. The findings show that decoder-attention on target-side context yields the strongest influence on pronoun resolution, with several heads offering measurable gains when tuned or modified; however, many heads attend without affecting performance, indicating underutilized potential. The work demonstrates that targeted head tuning can improve pronoun disambiguation by up to about 5 percentage points without sacrificing translation quality, offering practical avenues for enhancing context usage in MT and insights into how Transformer heads contribute to context-dependent phenomena.
Abstract
In this paper, we investigate the role of attention heads in Context-aware Machine Translation models for pronoun disambiguation in the English-to-German and English-to-French language directions. We analyze their influence by both observing and modifying the attention scores corresponding to the plausible relations that could impact a pronoun prediction. Our findings reveal that while some heads do attend the relations of interest, not all of them influence the models' ability to disambiguate pronouns. We show that certain heads are underutilized by the models, suggesting that model performance could be improved if only the heads would attend one of the relations more strongly. Furthermore, we fine-tune the most promising heads and observe the increase in pronoun disambiguation accuracy of up to 5 percentage points which demonstrates that the improvements in performance can be solidified into the models' parameters.
