Table of Contents
Fetching ...

Self-Translate-Train: Enhancing Cross-Lingual Transfer of Large Language Models via Inherent Capability

Ryokan Ri, Shun Kiyono, Sho Takase

TL;DR

This work tackles the gap in zero-shot cross-lingual transfer caused by misaligned multilingual representations. It introduces Self-Translate-Train, which uses the model’s own translation capability to generate target-language training data and fine-tunes on these synthetic translations, optionally including code-switching data. Across math, QA, and NLI tasks in de, ru, th, and zh, the method yields notable cross-lingual gains over English-only fine-tuning, with larger models showing stronger improvements, though translation quality in Thai remains a bottleneck. The findings show that eliciting inherent cross-lingual capabilities through self-translation can reduce the need for external data and guide future work on pre-training and in-context strategies for multilingual generalization.

Abstract

Zero-shot cross-lingual transfer by fine-tuning multilingual pretrained models shows promise for low-resource languages, but often suffers from misalignment of internal representations between languages. We hypothesize that even when the model cannot generalize across languages effectively in fine-tuning, it still captures cross-lingual correspondence useful for cross-lingual transfer. We explore this hypothesis with Self-Translate-Train, a method that lets large language models (LLMs) to translate training data into the target language and fine-tunes the model on its own generated data. By demonstrating that Self-Translate-Train outperforms zero-shot transfer, we encourage further exploration of better methods to elicit cross-lingual capabilities of LLMs.

Self-Translate-Train: Enhancing Cross-Lingual Transfer of Large Language Models via Inherent Capability

TL;DR

This work tackles the gap in zero-shot cross-lingual transfer caused by misaligned multilingual representations. It introduces Self-Translate-Train, which uses the model’s own translation capability to generate target-language training data and fine-tunes on these synthetic translations, optionally including code-switching data. Across math, QA, and NLI tasks in de, ru, th, and zh, the method yields notable cross-lingual gains over English-only fine-tuning, with larger models showing stronger improvements, though translation quality in Thai remains a bottleneck. The findings show that eliciting inherent cross-lingual capabilities through self-translation can reduce the need for external data and guide future work on pre-training and in-context strategies for multilingual generalization.

Abstract

Zero-shot cross-lingual transfer by fine-tuning multilingual pretrained models shows promise for low-resource languages, but often suffers from misalignment of internal representations between languages. We hypothesize that even when the model cannot generalize across languages effectively in fine-tuning, it still captures cross-lingual correspondence useful for cross-lingual transfer. We explore this hypothesis with Self-Translate-Train, a method that lets large language models (LLMs) to translate training data into the target language and fine-tunes the model on its own generated data. By demonstrating that Self-Translate-Train outperforms zero-shot transfer, we encourage further exploration of better methods to elicit cross-lingual capabilities of LLMs.
Paper Structure (31 sections, 5 figures, 13 tables)

This paper contains 31 sections, 5 figures, 13 tables.

Figures (5)

  • Figure 1: An overview of Self-Translate-Train.
  • Figure 2: Accuracy in the MGSM dataset with different model sizes of Llama2.
  • Figure 3: An input and output example of the SQuAD dataset.
  • Figure 4: An input and output example of the MultiNLP dataset.
  • Figure 5: Prompt format for LLM translation.