Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
Wafaa Mohammed, Vlad Niculae, Chrysoula Zerva
TL;DR
This paper addresses the challenge that large language models struggle with discourse phenomena in document-level translation. It introduces quality-aware decoding (QAD), an MBR-based inference strategy that leverages translation-quality signals to extract latent discourse knowledge from LLMs. Through comprehensive evaluation across six language pairs on DELA, TED2020, and WMT24++, the study shows that QAD consistently improves both overall translation quality and discourse handling, with context-aware prompting enhancing discourse phenomena further and TowerInstruct-13B often achieving the best results. The work contributes a rigorous evaluation framework, ablation studies of decoding setups, and human-annotated discourse data, demonstrating that QAD can make LLMs more suitable for real-world, discourse-sensitive translation tasks and offering practical guidance for deploying discourse-aware MT systems.
Abstract
Large language models (LLMs) have emerged as strong contenders in machine translation.Yet, they still struggle to adequately handle discourse phenomena, such as pronoun resolution and lexical cohesion at the document level. In this study, we thoroughly investigate the discourse phenomena performance of LLMs in context-aware translation. We demonstrate that discourse knowledge is encoded within LLMs and propose the use of quality-aware decoding (QAD) to effectively extract this knowledge, showcasing its superiority over other decoding approaches through comprehensive analysis. Furthermore, we illustrate that QAD enhances the semantic richness of translations and aligns them more closely with human preferences.
