Quantifying the Plausibility of Context Reliance in Neural Machine Translation
Gabriele Sarti, Grzegorz Chrupała, Malvina Nissim, Arianna Bisazza
TL;DR
PECoRe introduces a two-step interpretability framework to quantify context reliance in neural language generation, focusing on context-aware machine translation. The framework decomposes the task into Context-sensitive Token Identification (CTI) and Contextual Cues Imputation (CCI) to extract cue-target pairs and assess plausibility against human rationales, using metrics such as $D_{KL}$, LR, and $-$log likelihood differences. Evaluations on SCAT+, DiscEval-MT, OpusMT, and mBART-50 show that end-to-end plausibility signals are robust for certain discourse phenomena, with gradient-based attributions generally outperforming attention-based cues in end-to-end settings. Applying PECoRe to Flores-101 demonstrates practical utility for unannotated data, revealing both plausible and questionable context-driven translations and highlighting limitations of current plausibility evaluations in MT. The work suggests broader applicability to other generation tasks and encourages future integration with retrieval-augmented generation and chain-of-thought reasoning for trustworthy AI systems.
Abstract
Establishing whether language models can use contextual information in a human-plausible way is important to ensure their trustworthiness in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, with current plausibility evaluations being practically limited to a handful of artificial benchmarks. To address this, we introduce Plausibility Evaluation of Context Reliance (PECoRe), an end-to-end interpretability framework designed to quantify context usage in language models' generations. Our approach leverages model internals to (i) contrastively identify context-sensitive target tokens in generated texts and (ii) link them to contextual cues justifying their prediction. We use \pecore to quantify the plausibility of context-aware machine translation models, comparing model rationales with human annotations across several discourse-level phenomena. Finally, we apply our method to unannotated model translations to identify context-mediated predictions and highlight instances of (im)plausible context usage throughout generation.
