Fine-tuning Large Language Models for Domain-specific Machine Translation

Jiawei Zheng; Hanghai Hong; Feiyan Liu; Xiaoli Wang; Jingsong Su; Yonggui Liang; Shikai Wu

Fine-tuning Large Language Models for Domain-specific Machine Translation

Jiawei Zheng, Hanghai Hong, Feiyan Liu, Xiaoli Wang, Jingsong Su, Yonggui Liang, Shikai Wu

TL;DR

This work tackles domain-specific MT by integrating high-quality domain data with a three-part fine-tuning framework, DragFT. It combines dictionary-enhanced prompting (notably Dict-rephrasing), RAG-based few-shot example selection, and LoRA-based fine-tuning to adapt 13B-class LLM backbones to IT/Law/Medical translation tasks. Empirical results show DragFT delivers substantial gains across multiple domains and backbones, often surpassing GPT-3.5/4o baselines and improving terminology translation and UTW metrics. The approach demonstrates that targeted knowledge augmentation and high-relevance in-domain examples can significantly enhance domain-specific MT while mitigating noise from broad-domain pretraining.

Abstract

Large language models (LLMs) have shown great potential in domain-specific machine translation (MT). However, one major issue is that LLMs pre-trained on general domain corpus might not generalize well to specific domains due to the lack of domain-specific knowledge. To address this issue, this paper focuses on enhancing the domain-specific MT capability of LLMs, by providing high-quality training datasets and proposing a novel fine-tuning framework denoted by DragFT. DragFT augments LLMs via three techniques: (i) Dictionary-enhanced prompting integrates dictionary information into prompts to improve the translation of domain-specific terminology.; (ii) RAG-based few-shot example selection provides high-quality examples that simulate both the domain and style characteristics; (iii) Fine-tuning with few-shot examples further enhances performance when using in-domain examples. We deploy DragFT on three well-known LLM backbones with 13B training parameters to validate its effectiveness. The results on three domain-specific datasets show that DragFT achieves a significant performance boost and shows superior performance compared to advanced models such as GPT-3.5 and GPT-4o. The drastic performance improvement of DragFT over existing LLMs can be attributed to incorporating relevant knowledge while mitigating noise.

Fine-tuning Large Language Models for Domain-specific Machine Translation

TL;DR

Abstract

Paper Structure (27 sections, 3 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 27 sections, 3 equations, 5 figures, 9 tables, 1 algorithm.

Introduction
Related Works
ICL in Machine Translation
Instruction tuning in Machine Translation
Domain-specific Machine Translation
DragFT
Machine Translation Task
Dictionary-enhanced Prompting
RAG-based Few-shot Example Selection
Fine-tuning with Few-shot Examples
Experimental Setups
Datasets
Baselines
Implementation Details
Evaluation
...and 12 more sections

Figures (5)

Figure 1: The framework of DragFT, including three techniques: ($i$) Dictionary-enhanced prompting, ($ii$) RAG-based few-shot example selection, and ($iii$) Fine-tuning with few-shot examples.
Figure 1: The length distribution of tokenized outputs on the WMT22 test set (Zh$\Rightarrow$En).
Figure 2: An illustration of three dictionary enhancement prompts, including Dict-instruction, Dict-chain, and Dict-rephrasing.
Figure 3: Performance comparison of different dictionary-enhanced prompting methods on domain-specific test sets.
Figure 4: Comparison between the UTW before and after applying DragFT.

Fine-tuning Large Language Models for Domain-specific Machine Translation

TL;DR

Abstract

Fine-tuning Large Language Models for Domain-specific Machine Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)