Table of Contents
Fetching ...

Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data

Wei Zou, Sen Yang, Yu Bao, Shujian Huang, Jiajun Chen, Shanbo Cheng

TL;DR

This paper tackles the challenge of multilingual machine translation in low-data regimes by eliminating reliance on parallel corpora. It introduces Trans-Zero, a self-play framework that leverages only monolingual data and an intrinsic multilingual capability of large language models (LLMs) to bootstrap translation quality. The core methodology combines a Multilingual Translation Process (MTP) with Genetic Monte-Carlo Tree Search (G-MCTS) to explore translations, guided by semantic consistency across languages and refined through Self-Play Preference Optimization (SPPO). Empirical results show Trans-Zero achieves competitive performance with supervised fine-tuning on several language pairs, excelling in non-English directions, and indicate that increasing the number of languages during search can raise the method’s upper bound. Overall, the work demonstrates a resource-efficient direction for MT by shifting from parallel-supervised data to self-supervised monolingual learning, enabled by multilingual priors in modern LLMs.

Abstract

The rise of Large Language Models (LLMs) has reshaped machine translation (MT), but multilingual MT still relies heavily on parallel data for supervised fine-tuning (SFT), facing challenges like data scarcity for low-resource languages and catastrophic forgetting. To address these issues, we propose TRANS-ZERO, a self-play framework that leverages only monolingual data and the intrinsic multilingual knowledge of LLM. TRANS-ZERO combines Genetic Monte-Carlo Tree Search (G-MCTS) with preference optimization, achieving strong translation performance that rivals supervised methods. Experiments demonstrate that this approach not only matches the performance of models trained on large-scale parallel data but also excels in non-English translation directions. Further analysis reveals that G-MCTS itself significantly enhances translation quality by exploring semantically consistent candidates through iterative translations, providing a robust foundation for the framework's succuss.

Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data

TL;DR

This paper tackles the challenge of multilingual machine translation in low-data regimes by eliminating reliance on parallel corpora. It introduces Trans-Zero, a self-play framework that leverages only monolingual data and an intrinsic multilingual capability of large language models (LLMs) to bootstrap translation quality. The core methodology combines a Multilingual Translation Process (MTP) with Genetic Monte-Carlo Tree Search (G-MCTS) to explore translations, guided by semantic consistency across languages and refined through Self-Play Preference Optimization (SPPO). Empirical results show Trans-Zero achieves competitive performance with supervised fine-tuning on several language pairs, excelling in non-English directions, and indicate that increasing the number of languages during search can raise the method’s upper bound. Overall, the work demonstrates a resource-efficient direction for MT by shifting from parallel-supervised data to self-supervised monolingual learning, enabled by multilingual priors in modern LLMs.

Abstract

The rise of Large Language Models (LLMs) has reshaped machine translation (MT), but multilingual MT still relies heavily on parallel data for supervised fine-tuning (SFT), facing challenges like data scarcity for low-resource languages and catastrophic forgetting. To address these issues, we propose TRANS-ZERO, a self-play framework that leverages only monolingual data and the intrinsic multilingual knowledge of LLM. TRANS-ZERO combines Genetic Monte-Carlo Tree Search (G-MCTS) with preference optimization, achieving strong translation performance that rivals supervised methods. Experiments demonstrate that this approach not only matches the performance of models trained on large-scale parallel data but also excels in non-English translation directions. Further analysis reveals that G-MCTS itself significantly enhances translation quality by exploring semantically consistent candidates through iterative translations, providing a robust foundation for the framework's succuss.

Paper Structure

This paper contains 25 sections, 5 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of Trans-Zero. Once the tree is initiated, the search cycles the selection, expansion, simulation, and back-propagation for new nodes. b) G-MCTS selects the node with maximum UCB to expand a new child node. There are two types of genetic expansion: merge and mutate. c) A mass roll-out simulation of MTP trajectories assesses the semantic consistency. The assessed reward is backpropagated to guide the search. d) Finally, we harvest the search tree into data pairs for preference optimization.
  • Figure 2: An example of simulation on English-to-Italian translation candidate using $b=3$ and $n=2$. Through roll-outs of MTP, the Italian candidate () is assessed by semantics consistency of $b^n$ English reconstructions $\{\text{en}_\omega^1, \cdots, \text{en}_\omega^9\}$ from simulated trajectories.
  • Figure 3: BLEURT performance for SFT based on the Llama3.1-Base at different data sizes. We include the performance of Trans-Zero in each language direction.
  • Figure 4: The learning diagram of Trans-Zero on Llama3.1-Base for German-to-Chinese translation demonstrates the search process in 4-language and 6-language settings under G-MCTS. By incorporating 6 languages, Trans-Zero attains BLEURT scores on par with the baseline systems.