Table of Contents
Fetching ...

TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment

Zheng Li, Mao Zheng, Mingyang Song, Wenjie Yang

TL;DR

TAT-R1 addresses terminology translation in MT by integrating reinforcement learning with word-alignment-based rewards into a DeepSeek-R1–style reasoning framework. It combines a format reward, a COMET-based semantic reward, and three word-alignment rewards (R_aaw, R_aao, R_taw) within a GRPO-based RL routine, yielding an overall objective $R_{all}$ that promotes accurate term translations without sacrificing general translation quality. Empirical results on WMT ZH↔EN and RTT terminology data show substantial improvements in terminology accuracy and related semantic metrics, with ablations confirming the value of each reward and the robustness of RL over SFT. The work advances domain-specific MT by enabling terminology-aware translation through principled reward design and alignment-based guidance, with practical implications for specialized multilingual workflows.

Abstract

Recently, deep reasoning large language models(LLMs) like DeepSeek-R1 have made significant progress in tasks such as mathematics and coding. Inspired by this, several studies have employed reinforcement learning(RL) to enhance models' deep reasoning capabilities and improve machine translation(MT) quality. However, the terminology translation, an essential task in MT, remains unexplored in deep reasoning LLMs. In this paper, we propose \textbf{TAT-R1}, a terminology-aware translation model trained with reinforcement learning and word alignment. Specifically, we first extract the keyword translation pairs using a word alignment model. Then we carefully design three types of rule-based alignment rewards with the extracted alignment relationships. With those alignment rewards, the RL-trained translation model can learn to focus on the accurate translation of key information, including terminology in the source text. Experimental results show the effectiveness of TAT-R1. Our model significantly improves terminology translation accuracy compared to the baseline models while maintaining comparable performance on general translation tasks. In addition, we conduct detailed ablation studies of the DeepSeek-R1-like training paradigm for machine translation and reveal several key findings.

TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment

TL;DR

TAT-R1 addresses terminology translation in MT by integrating reinforcement learning with word-alignment-based rewards into a DeepSeek-R1–style reasoning framework. It combines a format reward, a COMET-based semantic reward, and three word-alignment rewards (R_aaw, R_aao, R_taw) within a GRPO-based RL routine, yielding an overall objective that promotes accurate term translations without sacrificing general translation quality. Empirical results on WMT ZH↔EN and RTT terminology data show substantial improvements in terminology accuracy and related semantic metrics, with ablations confirming the value of each reward and the robustness of RL over SFT. The work advances domain-specific MT by enabling terminology-aware translation through principled reward design and alignment-based guidance, with practical implications for specialized multilingual workflows.

Abstract

Recently, deep reasoning large language models(LLMs) like DeepSeek-R1 have made significant progress in tasks such as mathematics and coding. Inspired by this, several studies have employed reinforcement learning(RL) to enhance models' deep reasoning capabilities and improve machine translation(MT) quality. However, the terminology translation, an essential task in MT, remains unexplored in deep reasoning LLMs. In this paper, we propose \textbf{TAT-R1}, a terminology-aware translation model trained with reinforcement learning and word alignment. Specifically, we first extract the keyword translation pairs using a word alignment model. Then we carefully design three types of rule-based alignment rewards with the extracted alignment relationships. With those alignment rewards, the RL-trained translation model can learn to focus on the accurate translation of key information, including terminology in the source text. Experimental results show the effectiveness of TAT-R1. Our model significantly improves terminology translation accuracy compared to the baseline models while maintaining comparable performance on general translation tasks. In addition, we conduct detailed ablation studies of the DeepSeek-R1-like training paradigm for machine translation and reveal several key findings.

Paper Structure

This paper contains 14 sections, 13 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The overview of TAT-R1 training with RL and word alignment.
  • Figure 2: Compare the performance between SFT and RL.
  • Figure 3: Qualitative examples illustrate the effect of different rewards on EN to ZH translation.
  • Figure 4: Compare the average performance between different word alignment rewards.