Table of Contents
Fetching ...

Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation

Huangyu Dai, Ben Chen, Kaidi Chen, Ying Han, Zihan Liang, Wen Jiang

TL;DR

A novel algorithm named Contrastive Token Learning with Similarity Decay (CTSD) is introduced, which modulates the suppression of tokens dynamically, informed by varying attention weights and inter-token distances, and significantly outperforms existing approaches in precision and generalizability.

Abstract

For crosslingual conversation and trade, Neural Machine Translation (NMT) is pivotal yet faces persistent challenges with monotony and repetition in generated content. Traditional solutions that rely on penalizing text redundancy or token reoccurrence have shown limited efficacy, particularly for lengthy article and e-commerce descriptions with inherent redundancy, even with the advent of Large Language Models (LLMs). This paper investigates the underlying causes of textual repetition through the lens of information entropy, attributing the phenomenon to the elevated uncertainty within the input text. To address this, a novel algorithm named Contrastive Token Learning with Similarity Decay (CTSD) is introduced, which modulates the suppression of tokens dynamically, informed by varying attention weights and inter-token distances. Furthermore, an e-commerce dataset comprised of title texts of online real items is compiled and released susceptible to hallucination translations to benchmark the algorithm. Extensive evaluations demonstrate that CTSD significantly outperforms existing approaches in precision and generalizability. Additional online A/B testing underscores its practical value, showing marked improvements in user engagement and conversion. Notably, this method has been implemented with full traffic on eight multilingual sites of alibaba.com, the largest B2B e-commerce platform in the world.

Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation

TL;DR

A novel algorithm named Contrastive Token Learning with Similarity Decay (CTSD) is introduced, which modulates the suppression of tokens dynamically, informed by varying attention weights and inter-token distances, and significantly outperforms existing approaches in precision and generalizability.

Abstract

For crosslingual conversation and trade, Neural Machine Translation (NMT) is pivotal yet faces persistent challenges with monotony and repetition in generated content. Traditional solutions that rely on penalizing text redundancy or token reoccurrence have shown limited efficacy, particularly for lengthy article and e-commerce descriptions with inherent redundancy, even with the advent of Large Language Models (LLMs). This paper investigates the underlying causes of textual repetition through the lens of information entropy, attributing the phenomenon to the elevated uncertainty within the input text. To address this, a novel algorithm named Contrastive Token Learning with Similarity Decay (CTSD) is introduced, which modulates the suppression of tokens dynamically, informed by varying attention weights and inter-token distances. Furthermore, an e-commerce dataset comprised of title texts of online real items is compiled and released susceptible to hallucination translations to benchmark the algorithm. Extensive evaluations demonstrate that CTSD significantly outperforms existing approaches in precision and generalizability. Additional online A/B testing underscores its practical value, showing marked improvements in user engagement and conversion. Notably, this method has been implemented with full traffic on eight multilingual sites of alibaba.com, the largest B2B e-commerce platform in the world.
Paper Structure (16 sections, 6 equations, 5 figures, 9 tables)

This paper contains 16 sections, 6 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: ALTI+ results for En-De translation examples. (a) normal result. (b) middle appearing repetition result, and (c) total repetition result. The contribution values of all tokens in each row have been normalized.
  • Figure 2: T-SNE results of different generated tokens. (a) middle appearing repetition result and (b) total repetition result.
  • Figure 3: Attenuate factor of different generated tokens. (a) attention similarity and (b) exponential decay matrix.
  • Figure 4: The impact of hyperparameters W and N on the translation quality and reproducibility of the NLLB-1.3B model.
  • Figure 5: The impact of hyperparameters W and N on the translation quality and reproducibility of the Qwen-7B model.