Table of Contents
Fetching ...

Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization

Inacio Vieira, Antonio Castaldo, James O'Doherty, Sheila Castilho

TL;DR

We address domain adaptation in machine translation by applying Contrastive Preference Optimization (CPO) to data-efficiently align LLMs with domain-specific terminology and style. The method simulates a post-editing workflow by contrasting the base model's raw translation (negative) with a human-approved translation from a domain TM (positive), using an on-policy, single-stage training objective that combines a preference signal with standard SFT. Empirical results on English–Brazilian Portuguese and English–Korean show that only 14.7k synthetic preference pairs can match or exceed the performance of SFT trained on 160k+ in-domain examples, with substantially less compute (about 51% fewer GPU-seconds) and lower energy use. This approach generalizes to other generative tasks where the model's initial drafts can be corrected by gold references, offering a scalable and practical pathway for continual, domain-sensitive adaptation.

Abstract

LLMs often require adaptation to domain-specific requirements, a process that can be expensive when relying solely on SFT. We present an empirical study on applying CPO to simulate a post-editing workflow for data-efficient domain adaptation. Our approach synthesizes preference pairs by treating the base model's own raw output as the 'rejected' translation and the human-approved TM entry as the 'chosen' one. This method provides direct feedback on the model's current knowledge, guiding it to align with domain-specific standards. Experiments in English-Brazilian Portuguese and English-Korean show that, by using just 14.7k preference pairs, the model achieves performance close to that of a model trained on 160k+ samples with SFT, demonstrating significant data efficiency. Although we showcase its effectiveness in MT, this application of CPO naturally generalizes to other generative tasks where a model's initial drafts can serve as a contrastive signal against a golden reference.

Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization

TL;DR

We address domain adaptation in machine translation by applying Contrastive Preference Optimization (CPO) to data-efficiently align LLMs with domain-specific terminology and style. The method simulates a post-editing workflow by contrasting the base model's raw translation (negative) with a human-approved translation from a domain TM (positive), using an on-policy, single-stage training objective that combines a preference signal with standard SFT. Empirical results on English–Brazilian Portuguese and English–Korean show that only 14.7k synthetic preference pairs can match or exceed the performance of SFT trained on 160k+ in-domain examples, with substantially less compute (about 51% fewer GPU-seconds) and lower energy use. This approach generalizes to other generative tasks where the model's initial drafts can be corrected by gold references, offering a scalable and practical pathway for continual, domain-sensitive adaptation.

Abstract

LLMs often require adaptation to domain-specific requirements, a process that can be expensive when relying solely on SFT. We present an empirical study on applying CPO to simulate a post-editing workflow for data-efficient domain adaptation. Our approach synthesizes preference pairs by treating the base model's own raw output as the 'rejected' translation and the human-approved TM entry as the 'chosen' one. This method provides direct feedback on the model's current knowledge, guiding it to align with domain-specific standards. Experiments in English-Brazilian Portuguese and English-Korean show that, by using just 14.7k preference pairs, the model achieves performance close to that of a model trained on 160k+ samples with SFT, demonstrating significant data efficiency. Although we showcase its effectiveness in MT, this application of CPO naturally generalizes to other generative tasks where a model's initial drafts can serve as a contrastive signal against a golden reference.

Paper Structure

This paper contains 29 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: End-to-end workflow showing (1) synthetic preference pair generation through base model inference, (2) CPO fine-tuning, and (3) automatic evaluation.
  • Figure 2: Generating the synthetic 'rejected' candidate by running inference with TM source text on the base model.
  • Figure 3: Comparison of COMET Scores for EN>PTBR between SFT and CPO
  • Figure 4: Comparison of COMET Scores for EN>KO between SFT and CPO
  • Figure 5: Adaptive MT: Symbiotic Human and LLM Translation Feedback Loop. Contrastive training on post-edited machine translation triplets.