Table of Contents
Fetching ...

Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

Nuo Xu, Jun Zhao, Can Zu, Sixian Li, Lu Chen, Zhihao Zhang, Rui Zheng, Shihan Dou, Wenjuan Qin, Tao Gui, Qi Zhang, Xuanjing Huang

TL;DR

This paper tackles aligning machine translation outputs with human preferences while reducing the data burden by using cost-effective preference learning. It introduces a three-stage RLHF framework: supervised fine-tuning on parallel data to establish translation ability, reward-model training using high-quality book translations to capture human preferences, and RL fine-tuning with a KL-penalized objective to improve translations under RM guidance. The approach demonstrates that RLHF can enhance translation quality and enable cross-direction transfer, with the RM’s language capabilities and data properties significantly influencing outcomes. Limitations include dataset scope (primarily English-Chinese book data) and evaluation coverage; future work aims to broaden languages and perform more extensive manual evaluations.

Abstract

Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations. In this manner, the reward model learns the deficiencies of machine translation compared to human and guides subsequent improvements in machine translation. Experimental results demonstrate that \textit{RLHF} can effectively enhance translation quality and this improvement benefits other translation directions not trained with \textit{RLHF}. Further analysis indicates that the model's language capabilities play a crucial role in preference learning. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality and align better with real human translation preferences.

Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

TL;DR

This paper tackles aligning machine translation outputs with human preferences while reducing the data burden by using cost-effective preference learning. It introduces a three-stage RLHF framework: supervised fine-tuning on parallel data to establish translation ability, reward-model training using high-quality book translations to capture human preferences, and RL fine-tuning with a KL-penalized objective to improve translations under RM guidance. The approach demonstrates that RLHF can enhance translation quality and enable cross-direction transfer, with the RM’s language capabilities and data properties significantly influencing outcomes. Limitations include dataset scope (primarily English-Chinese book data) and evaluation coverage; future work aims to broaden languages and perform more extensive manual evaluations.

Abstract

Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations. In this manner, the reward model learns the deficiencies of machine translation compared to human and guides subsequent improvements in machine translation. Experimental results demonstrate that \textit{RLHF} can effectively enhance translation quality and this improvement benefits other translation directions not trained with \textit{RLHF}. Further analysis indicates that the model's language capabilities play a crucial role in preference learning. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality and align better with real human translation preferences.
Paper Structure (20 sections, 3 equations, 6 figures, 5 tables)

This paper contains 20 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: An Overview of Modeling Translation Preferences using RLHF; To achieve cost-effective preference learning, we optimize the reward model in the second step by contrasting the deficiencies of SFT model translations with human expert translations, thus avoiding the expensive labeling of preference data.
  • Figure 2: The process of constructing the English-Chinese book dataset.
  • Figure 3: Comparison between preference optimized models and the SFT model on Task En$\rightarrow$Zh. G and H represent GPT-4 and humans as evaluators, respectively.
  • Figure 4: Comparison between preference optimized models and the SFT model on Task Zh$\rightarrow$En. G and H represent GPT-4 and humans as evaluators, respectively.
  • Figure 5: After replacing the base model in Figure \ref{['fig:ultrallama2-enzh']} with LLaMA, compare the preference optimized model and the SFT model in the En$\rightarrow$Zh translation direction.
  • ...and 1 more figures