Table of Contents
Fetching ...

Analyzing Context Contributions in LLM-based Machine Translation

Emmanouil Zaranis, Nuno M. Guerreiro, André F. T. Martins

TL;DR

It is demonstrated that inspecting anomalous context contributions can potentially uncover pathological translations, such as hallucinations, and shed light on the internal workings of LLM-based MT which go beyond those known for standard encoder-decoder MT models.

Abstract

Large language models (LLMs) have achieved state-of-the-art performance in machine translation (MT) and demonstrated the ability to leverage in-context learning through few-shot examples. However, the mechanisms by which LLMs use different parts of the input context remain largely unexplored. In this work, we provide a comprehensive analysis of context utilization in MT, studying how LLMs use various context parts, such as few-shot examples and the source text, when generating translations. We highlight several key findings: (1) the source part of few-shot examples appears to contribute more than its corresponding targets, irrespective of translation direction; (2) finetuning LLMs with parallel data alters the contribution patterns of different context parts; and (3) there is a positional bias where earlier few-shot examples have higher contributions to the translated sequence. Finally, we demonstrate that inspecting anomalous context contributions can potentially uncover pathological translations, such as hallucinations. Our findings shed light on the internal workings of LLM-based MT which go beyond those known for standard encoder-decoder MT models.

Analyzing Context Contributions in LLM-based Machine Translation

TL;DR

It is demonstrated that inspecting anomalous context contributions can potentially uncover pathological translations, such as hallucinations, and shed light on the internal workings of LLM-based MT which go beyond those known for standard encoder-decoder MT models.

Abstract

Large language models (LLMs) have achieved state-of-the-art performance in machine translation (MT) and demonstrated the ability to leverage in-context learning through few-shot examples. However, the mechanisms by which LLMs use different parts of the input context remain largely unexplored. In this work, we provide a comprehensive analysis of context utilization in MT, studying how LLMs use various context parts, such as few-shot examples and the source text, when generating translations. We highlight several key findings: (1) the source part of few-shot examples appears to contribute more than its corresponding targets, irrespective of translation direction; (2) finetuning LLMs with parallel data alters the contribution patterns of different context parts; and (3) there is a positional bias where earlier few-shot examples have higher contributions to the translated sequence. Finally, we demonstrate that inspecting anomalous context contributions can potentially uncover pathological translations, such as hallucinations. Our findings shed light on the internal workings of LLM-based MT which go beyond those known for standard encoder-decoder MT models.

Paper Structure

This paper contains 57 sections, 21 figures, 11 tables.

Figures (21)

  • Figure 1: Illustration of synthetic part-level total contributions computation given 2 examples as context. From the token-to-token level contribution matrix $\bm{M}_y^\ell$, we compute the total contribution of each input part to each generated token, by summing the corresponding token-level contributions. Subsequently, we compute the part-level total contribution of each input part to the translated sequence, by averaging over the generated tokens.
  • Figure 2: Illustration of context's part-level contributions to the translated sequence, for all the examined models.
  • Figure 3: Example of anomalous source contributions for Tower which hallucinates, copying information from the first example. We show contribution ratios to E1|SRC---$1$ being the contribution of E1|SRC.
  • Figure 4: Proportion of de-en samples that follow positional bias, for different values of $K$, in the (a) original and (b) replace-last-ex settings.
  • Figure 5: Illustration of context's part-level contributions, when the task description is added. Translation direction: German to English
  • ...and 16 more figures