Fine-tuning Large Language Models for Domain-specific Machine Translation
Jiawei Zheng, Hanghai Hong, Feiyan Liu, Xiaoli Wang, Jingsong Su, Yonggui Liang, Shikai Wu
TL;DR
This work tackles domain-specific MT by integrating high-quality domain data with a three-part fine-tuning framework, DragFT. It combines dictionary-enhanced prompting (notably Dict-rephrasing), RAG-based few-shot example selection, and LoRA-based fine-tuning to adapt 13B-class LLM backbones to IT/Law/Medical translation tasks. Empirical results show DragFT delivers substantial gains across multiple domains and backbones, often surpassing GPT-3.5/4o baselines and improving terminology translation and UTW metrics. The approach demonstrates that targeted knowledge augmentation and high-relevance in-domain examples can significantly enhance domain-specific MT while mitigating noise from broad-domain pretraining.
Abstract
Large language models (LLMs) have shown great potential in domain-specific machine translation (MT). However, one major issue is that LLMs pre-trained on general domain corpus might not generalize well to specific domains due to the lack of domain-specific knowledge. To address this issue, this paper focuses on enhancing the domain-specific MT capability of LLMs, by providing high-quality training datasets and proposing a novel fine-tuning framework denoted by DragFT. DragFT augments LLMs via three techniques: (i) Dictionary-enhanced prompting integrates dictionary information into prompts to improve the translation of domain-specific terminology.; (ii) RAG-based few-shot example selection provides high-quality examples that simulate both the domain and style characteristics; (iii) Fine-tuning with few-shot examples further enhances performance when using in-domain examples. We deploy DragFT on three well-known LLM backbones with 13B training parameters to validate its effectiveness. The results on three domain-specific datasets show that DragFT achieves a significant performance boost and shows superior performance compared to advanced models such as GPT-3.5 and GPT-4o. The drastic performance improvement of DragFT over existing LLMs can be attributed to incorporating relevant knowledge while mitigating noise.
