Simpson's Paradox and the Accuracy-Fluency Tradeoff in Translation
Zheng Wei Lim, Ekaterina Vylomova, Trevor Cohn, Charles Kemp
TL;DR
This work tackles the problem of how to reconcile translation accuracy and fluency. It introduces a formal framing with $\\text{accuracy}_M=\\log p(\\boldsymbol{x}|\\boldsymbol{y})$ and $\\text{fluency}_M=\\log p(\\boldsymbol{y})$, and demonstrates that Simpson's paradox yields a positive corpus-level association despite segment-level tradeoffs. The authors validate the paradox through theoretical formulation, Gaussian-simulated toy data, and empirical analyses of human and machine translations across CRITT, RLTC, MTMQM using a single NMT model to estimate the required probabilities, with an alternative verification using the M2M100 model in the appendix. The findings motivate segment-level evaluation and suggest balancing accuracy and fluency via noisy-channel objectives, with implications for both quality assessment protocols and MT system design.
Abstract
A good translation should be faithful to the source and should respect the norms of the target language. We address a theoretical puzzle about the relationship between these objectives. On one hand, intuition and some prior work suggest that accuracy and fluency should trade off against each other, and that capturing every detail of the source can only be achieved at the cost of fluency. On the other hand, quality assessment researchers often suggest that accuracy and fluency are highly correlated and difficult for human raters to distinguish (Callison-Burch et al., 2007). We show that the tension between these views is an instance of Simpson's paradox, and that accuracy and fluency are positively correlated at the level of the corpus but trade off at the level of individual source segments. We further suggest that the relationship between accuracy and fluency is best evaluated at the segment (or sentence) level, and that the trade off between these dimensions has implications both for assessing translation quality and developing improved MT systems.
