Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization

Inacio Vieira; Antonio Castaldo; James O'Doherty; Sheila Castilho

Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization

Inacio Vieira, Antonio Castaldo, James O'Doherty, Sheila Castilho

TL;DR

We address domain adaptation in machine translation by applying Contrastive Preference Optimization (CPO) to data-efficiently align LLMs with domain-specific terminology and style. The method simulates a post-editing workflow by contrasting the base model's raw translation (negative) with a human-approved translation from a domain TM (positive), using an on-policy, single-stage training objective that combines a preference signal with standard SFT. Empirical results on English–Brazilian Portuguese and English–Korean show that only 14.7k synthetic preference pairs can match or exceed the performance of SFT trained on 160k+ in-domain examples, with substantially less compute (about 51% fewer GPU-seconds) and lower energy use. This approach generalizes to other generative tasks where the model's initial drafts can be corrected by gold references, offering a scalable and practical pathway for continual, domain-sensitive adaptation.

Abstract

LLMs often require adaptation to domain-specific requirements, a process that can be expensive when relying solely on SFT. We present an empirical study on applying CPO to simulate a post-editing workflow for data-efficient domain adaptation. Our approach synthesizes preference pairs by treating the base model's own raw output as the 'rejected' translation and the human-approved TM entry as the 'chosen' one. This method provides direct feedback on the model's current knowledge, guiding it to align with domain-specific standards. Experiments in English-Brazilian Portuguese and English-Korean show that, by using just 14.7k preference pairs, the model achieves performance close to that of a model trained on 160k+ samples with SFT, demonstrating significant data efficiency. Although we showcase its effectiveness in MT, this application of CPO naturally generalizes to other generative tasks where a model's initial drafts can serve as a contrastive signal against a golden reference.

Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization

TL;DR

Abstract

Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)