Table of Contents
Fetching ...

NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in Low-Resource Languages

Mamadou K. Keita, Christopher Homan, Huy Le

TL;DR

NSL-MT tackles low-resource machine translation by injecting linguistically informed negative evidence into the training objective, explicitly penalizing outputs that violate target-language grammar. By generating hard negatives across morphological, syntactic, and lexical dimensions with severity weights, NSL-MT complements standard likelihood training and yields large gains in data-constrained scenarios while improving cross-architecture performance. The method achieves up to 89% BLEU improvements on weaker baselines and demonstrates a practical 5x data-efficiency benefit, especially at 1,000 examples, by guiding models toward correct target-language structure with explicit negative signals. Across languages and architectures, NSL-MT reduces form-related errors and preserves meaning, offering a scalable approach to enhance translation quality with limited parallel data and minimal additional linguistic resources.

Abstract

We introduce Negative Space Learning MT (NSL-MT), a training method that teaches models what not to generate by encoding linguistic constraints as severity-weighted penalties in the loss function. NSL-MT increases limited parallel data with synthetically generated violations of target language grammar, explicitly penalizing the model when it assigns high probability to these linguistically invalid outputs. We demonstrate that NSL-MT delivers improvements across all architectures: 3-12\% BLEU gains for well-performing models and 56-89\% gains for models lacking descent initial support. Furthermore, NSL-MT provides a 5x data efficiency multiplier -- training with 1,000 examples matches or exceeds normal training with 5,000 examples. Thus, NSL-MT provides a data-efficient alternative training method for settings where there is limited annotated parallel corporas.

NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in Low-Resource Languages

TL;DR

NSL-MT tackles low-resource machine translation by injecting linguistically informed negative evidence into the training objective, explicitly penalizing outputs that violate target-language grammar. By generating hard negatives across morphological, syntactic, and lexical dimensions with severity weights, NSL-MT complements standard likelihood training and yields large gains in data-constrained scenarios while improving cross-architecture performance. The method achieves up to 89% BLEU improvements on weaker baselines and demonstrates a practical 5x data-efficiency benefit, especially at 1,000 examples, by guiding models toward correct target-language structure with explicit negative signals. Across languages and architectures, NSL-MT reduces form-related errors and preserves meaning, offering a scalable approach to enhance translation quality with limited parallel data and minimal additional linguistic resources.

Abstract

We introduce Negative Space Learning MT (NSL-MT), a training method that teaches models what not to generate by encoding linguistic constraints as severity-weighted penalties in the loss function. NSL-MT increases limited parallel data with synthetically generated violations of target language grammar, explicitly penalizing the model when it assigns high probability to these linguistically invalid outputs. We demonstrate that NSL-MT delivers improvements across all architectures: 3-12\% BLEU gains for well-performing models and 56-89\% gains for models lacking descent initial support. Furthermore, NSL-MT provides a 5x data efficiency multiplier -- training with 1,000 examples matches or exceeds normal training with 5,000 examples. Thus, NSL-MT provides a data-efficient alternative training method for settings where there is limited annotated parallel corporas.

Paper Structure

This paper contains 35 sections, 4 equations, 1 figure, 5 tables, 1 algorithm.

Figures (1)

  • Figure 1: Data efficiency comparison between Normal training(in red) and NSL-MT(in green) across varying training set sizes. NSL-MT achieves high performance at all data sizes, with the largest relative gains occurring at the smallest data sizes.