Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation
Nathaniel Berger, Stefan Riezler, Miriam Exel, Matthias Huck
TL;DR
This work introduces a light-weight, error-marked prompting approach to improve domain-specific machine translation by augmenting translation memories with token-level error markings. At test time, a user marks errors in a translation, and a few similar in-context examples retrieved from the error-annotated PE-TM guide an LLM to focus corrections on the marked tokens. Experiments with IT-domain English–German data using Llama 13B and GPT-3.5 show that error markings significantly increase targeted edits and improve BLEU and TER scores compared to MT and automatic post-editing, with human evaluation indicating a majority of MRK edits are correct. The study suggests a practical, interactive feedback loop for steering LLMs toward focused self-corrections and points to future work on learned error-marking models and larger translation memories.
Abstract
While large language models (LLMs) pre-trained on massive amounts of unpaired language data have reached the state-of-the-art in machine translation (MT) of general domain texts, post-editing (PE) is still required to correct errors and to enhance term translation quality in specialized domains. In this paper we present a pilot study of enhancing translation memories (TM) produced by PE (source segments, machine translations, and reference translations, henceforth called PE-TM) for the needs of correct and consistent term translation in technical domains. We investigate a light-weight two-step scenario where, at inference time, a human translator marks errors in the first translation step, and in a second step a few similar examples are extracted from the PE-TM to prompt an LLM. Our experiment shows that the additional effort of augmenting translations with human error markings guides the LLM to focus on a correction of the marked errors, yielding consistent improvements over automatic PE (APE) and MT from scratch.
