TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment
Zheng Li, Mao Zheng, Mingyang Song, Wenjie Yang
TL;DR
TAT-R1 addresses terminology translation in MT by integrating reinforcement learning with word-alignment-based rewards into a DeepSeek-R1–style reasoning framework. It combines a format reward, a COMET-based semantic reward, and three word-alignment rewards (R_aaw, R_aao, R_taw) within a GRPO-based RL routine, yielding an overall objective $R_{all}$ that promotes accurate term translations without sacrificing general translation quality. Empirical results on WMT ZH↔EN and RTT terminology data show substantial improvements in terminology accuracy and related semantic metrics, with ablations confirming the value of each reward and the robustness of RL over SFT. The work advances domain-specific MT by enabling terminology-aware translation through principled reward design and alignment-based guidance, with practical implications for specialized multilingual workflows.
Abstract
Recently, deep reasoning large language models(LLMs) like DeepSeek-R1 have made significant progress in tasks such as mathematics and coding. Inspired by this, several studies have employed reinforcement learning(RL) to enhance models' deep reasoning capabilities and improve machine translation(MT) quality. However, the terminology translation, an essential task in MT, remains unexplored in deep reasoning LLMs. In this paper, we propose \textbf{TAT-R1}, a terminology-aware translation model trained with reinforcement learning and word alignment. Specifically, we first extract the keyword translation pairs using a word alignment model. Then we carefully design three types of rule-based alignment rewards with the extracted alignment relationships. With those alignment rewards, the RL-trained translation model can learn to focus on the accurate translation of key information, including terminology in the source text. Experimental results show the effectiveness of TAT-R1. Our model significantly improves terminology translation accuracy compared to the baseline models while maintaining comparable performance on general translation tasks. In addition, we conduct detailed ablation studies of the DeepSeek-R1-like training paradigm for machine translation and reveal several key findings.
