Table of Contents
Fetching ...

Fine-tuning Large Language Models for Domain-specific Machine Translation

Jiawei Zheng, Hanghai Hong, Feiyan Liu, Xiaoli Wang, Jingsong Su, Yonggui Liang, Shikai Wu

TL;DR

This work tackles domain-specific MT by integrating high-quality domain data with a three-part fine-tuning framework, DragFT. It combines dictionary-enhanced prompting (notably Dict-rephrasing), RAG-based few-shot example selection, and LoRA-based fine-tuning to adapt 13B-class LLM backbones to IT/Law/Medical translation tasks. Empirical results show DragFT delivers substantial gains across multiple domains and backbones, often surpassing GPT-3.5/4o baselines and improving terminology translation and UTW metrics. The approach demonstrates that targeted knowledge augmentation and high-relevance in-domain examples can significantly enhance domain-specific MT while mitigating noise from broad-domain pretraining.

Abstract

Large language models (LLMs) have shown great potential in domain-specific machine translation (MT). However, one major issue is that LLMs pre-trained on general domain corpus might not generalize well to specific domains due to the lack of domain-specific knowledge. To address this issue, this paper focuses on enhancing the domain-specific MT capability of LLMs, by providing high-quality training datasets and proposing a novel fine-tuning framework denoted by DragFT. DragFT augments LLMs via three techniques: (i) Dictionary-enhanced prompting integrates dictionary information into prompts to improve the translation of domain-specific terminology.; (ii) RAG-based few-shot example selection provides high-quality examples that simulate both the domain and style characteristics; (iii) Fine-tuning with few-shot examples further enhances performance when using in-domain examples. We deploy DragFT on three well-known LLM backbones with 13B training parameters to validate its effectiveness. The results on three domain-specific datasets show that DragFT achieves a significant performance boost and shows superior performance compared to advanced models such as GPT-3.5 and GPT-4o. The drastic performance improvement of DragFT over existing LLMs can be attributed to incorporating relevant knowledge while mitigating noise.

Fine-tuning Large Language Models for Domain-specific Machine Translation

TL;DR

This work tackles domain-specific MT by integrating high-quality domain data with a three-part fine-tuning framework, DragFT. It combines dictionary-enhanced prompting (notably Dict-rephrasing), RAG-based few-shot example selection, and LoRA-based fine-tuning to adapt 13B-class LLM backbones to IT/Law/Medical translation tasks. Empirical results show DragFT delivers substantial gains across multiple domains and backbones, often surpassing GPT-3.5/4o baselines and improving terminology translation and UTW metrics. The approach demonstrates that targeted knowledge augmentation and high-relevance in-domain examples can significantly enhance domain-specific MT while mitigating noise from broad-domain pretraining.

Abstract

Large language models (LLMs) have shown great potential in domain-specific machine translation (MT). However, one major issue is that LLMs pre-trained on general domain corpus might not generalize well to specific domains due to the lack of domain-specific knowledge. To address this issue, this paper focuses on enhancing the domain-specific MT capability of LLMs, by providing high-quality training datasets and proposing a novel fine-tuning framework denoted by DragFT. DragFT augments LLMs via three techniques: (i) Dictionary-enhanced prompting integrates dictionary information into prompts to improve the translation of domain-specific terminology.; (ii) RAG-based few-shot example selection provides high-quality examples that simulate both the domain and style characteristics; (iii) Fine-tuning with few-shot examples further enhances performance when using in-domain examples. We deploy DragFT on three well-known LLM backbones with 13B training parameters to validate its effectiveness. The results on three domain-specific datasets show that DragFT achieves a significant performance boost and shows superior performance compared to advanced models such as GPT-3.5 and GPT-4o. The drastic performance improvement of DragFT over existing LLMs can be attributed to incorporating relevant knowledge while mitigating noise.
Paper Structure (27 sections, 3 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 27 sections, 3 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: The framework of DragFT, including three techniques: ($i$) Dictionary-enhanced prompting, ($ii$) RAG-based few-shot example selection, and ($iii$) Fine-tuning with few-shot examples.
  • Figure 1: The length distribution of tokenized outputs on the WMT22 test set (Zh$\Rightarrow$En).
  • Figure 2: An illustration of three dictionary enhancement prompts, including Dict-instruction, Dict-chain, and Dict-rephrasing.
  • Figure 3: Performance comparison of different dictionary-enhanced prompting methods on domain-specific test sets.
  • Figure 4: Comparison between the UTW before and after applying DragFT.