Table of Contents
Fetching ...

Quantifying the Plausibility of Context Reliance in Neural Machine Translation

Gabriele Sarti, Grzegorz Chrupała, Malvina Nissim, Arianna Bisazza

TL;DR

PECoRe introduces a two-step interpretability framework to quantify context reliance in neural language generation, focusing on context-aware machine translation. The framework decomposes the task into Context-sensitive Token Identification (CTI) and Contextual Cues Imputation (CCI) to extract cue-target pairs and assess plausibility against human rationales, using metrics such as $D_{KL}$, LR, and $-$log likelihood differences. Evaluations on SCAT+, DiscEval-MT, OpusMT, and mBART-50 show that end-to-end plausibility signals are robust for certain discourse phenomena, with gradient-based attributions generally outperforming attention-based cues in end-to-end settings. Applying PECoRe to Flores-101 demonstrates practical utility for unannotated data, revealing both plausible and questionable context-driven translations and highlighting limitations of current plausibility evaluations in MT. The work suggests broader applicability to other generation tasks and encourages future integration with retrieval-augmented generation and chain-of-thought reasoning for trustworthy AI systems.

Abstract

Establishing whether language models can use contextual information in a human-plausible way is important to ensure their trustworthiness in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, with current plausibility evaluations being practically limited to a handful of artificial benchmarks. To address this, we introduce Plausibility Evaluation of Context Reliance (PECoRe), an end-to-end interpretability framework designed to quantify context usage in language models' generations. Our approach leverages model internals to (i) contrastively identify context-sensitive target tokens in generated texts and (ii) link them to contextual cues justifying their prediction. We use \pecore to quantify the plausibility of context-aware machine translation models, comparing model rationales with human annotations across several discourse-level phenomena. Finally, we apply our method to unannotated model translations to identify context-mediated predictions and highlight instances of (im)plausible context usage throughout generation.

Quantifying the Plausibility of Context Reliance in Neural Machine Translation

TL;DR

PECoRe introduces a two-step interpretability framework to quantify context reliance in neural language generation, focusing on context-aware machine translation. The framework decomposes the task into Context-sensitive Token Identification (CTI) and Contextual Cues Imputation (CCI) to extract cue-target pairs and assess plausibility against human rationales, using metrics such as , LR, and log likelihood differences. Evaluations on SCAT+, DiscEval-MT, OpusMT, and mBART-50 show that end-to-end plausibility signals are robust for certain discourse phenomena, with gradient-based attributions generally outperforming attention-based cues in end-to-end settings. Applying PECoRe to Flores-101 demonstrates practical utility for unannotated data, revealing both plausible and questionable context-driven translations and highlighting limitations of current plausibility evaluations in MT. The work suggests broader applicability to other generation tasks and encourages future integration with retrieval-augmented generation and chain-of-thought reasoning for trustworthy AI systems.

Abstract

Establishing whether language models can use contextual information in a human-plausible way is important to ensure their trustworthiness in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, with current plausibility evaluations being practically limited to a handful of artificial benchmarks. To address this, we introduce Plausibility Evaluation of Context Reliance (PECoRe), an end-to-end interpretability framework designed to quantify context usage in language models' generations. Our approach leverages model internals to (i) contrastively identify context-sensitive target tokens in generated texts and (ii) link them to contextual cues justifying their prediction. We use \pecore to quantify the plausibility of context-aware machine translation models, comparing model rationales with human annotations across several discourse-level phenomena. Finally, we apply our method to unannotated model translations to identify context-mediated predictions and highlight instances of (im)plausible context usage throughout generation.
Paper Structure (43 sections, 9 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 43 sections, 9 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: Examples of sentence-level and contextual English$\rightarrow$Italian MT. Sentence-level translation contain lack-of-context errors. Instead, in the contextual case context-sensitive source tokens are disambiguated using source (Ⓢ) or target-based (Ⓣ) PaleBlue1.5ptcontextual cues1pt1ptblack to produce correct context-sensitive target tokens. PECoRe enables the end-to-end extraction of PaleBlue1.5ptcue1pt1ptblack-target pairs (e.g. <she, alla pastorella>, <le pecore, le>).
  • Figure 2: The PECoRe framework. Left: Context-sensitive token identification (CTI). ①: A context-aware MT model translates source context ($C_x$) and current ($x$) sentences into target context ($C_{\hat{y}}$) and current ($\hat{y}$) outputs. ②: $\hat{y}$ is force-decoded in the non-contextual setting instead of natural output $\tilde{y}$. ③: Contrastive metrics are collected throughout the model for every $\hat{y}$ token to compare the two settings. ④: Selector $s_{\textsc{cti}}$ maps metrics to binary context-sensitive labels for every $\hat{y}_i$. Right: Contextual cues imputation (CCI). ①: Non-contextual target $\tilde{y}^*$ is generated from contextual prefix $\hat{y}_{<t}$. ②: Function $f_{\textsc{tgt}}$ is selected to contrast model predictions with ($\hat{y}_t$) and without ($\tilde{y}_t^*$) input context. ③: Attribution method $f_{\textsc{att}}$ using $f_{\textsc{tgt}}$ as target scores contextual cues driving $\hat{y}_t$ prediction. ④: Selector $s_{\textsc{cci}}$ selects relevant cues, and cue-target pairs are assembled.
  • Figure 3: Macro F1 of contrastive metrics for context-sensitive target token identification (CTI) using OpusMT Large on the full datasets (left) or on ok-cs context-sensitive subsets (right).
  • Figure 4: Macro F1 of CCI methods over full datasets using OpusMT Large models trained with only source context (left) or with source+target context (right). Boxes and red median lines show CCI results based on gold context-sensitive tokens. Dotted bars show median CCI scores obtained from context-sensitive tokens identified by KL-Divergence during CTI (E2E settings).
  • Figure 5: Macro F1 of contrastive metrics for context-sensitive target token identification (CTI) on the full datasets (left) or on ok-cs context-sensitive subsets (right). Top to bottom:① OpusMT Small S$_{ctx}$② OpusMT Large S$_{ctx}$③ mBART-50 S$_{ctx}$④ OpusMT Small S+T$_{ctx}$⑤ OpusMT Large S+T$_{ctx}$⑥ mBART-50 S+T$_{ctx}$.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 3.1
  • Remark 3.1
  • Remark 3.2
  • Remark 3.3
  • Definition B.1