A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation

Francois Meyer; Jan Buys

A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation

Francois Meyer, Jan Buys

TL;DR

This paper investigates how subword segmentation affects multilingual MT and cross-lingual transfer, using English-to-Siswati and related South African languages to compare five subword methods. Through two experimental tracks—multilingual MT and cross-lingual finetuning—across diverse linguistic typologies, it demonstrates that subword regularisation via ULM enhances synergy in multilingual settings, whereas BPE facilitates transfer during finetuning. The study also shows that orthographic word boundary conventions can impede cross-lingual transfer more than linguistic relatedness, underscoring orthography as a key factor in multilingual modelling. Practically, the findings guide practitioners to tailor subword strategies: adopt ULM for multilingual synergy and BPE for cross-lingual transfer, while paying close attention to orthographic differences between languages.

Abstract

Multilingual modelling can improve machine translation for low-resource languages, partly through shared subword representations. This paper studies the role of subword segmentation in cross-lingual transfer. We systematically compare the efficacy of several subword methods in promoting synergy and preventing interference across different linguistic typologies. Our findings show that subword regularisation boosts synergy in multilingual modelling, whereas BPE more effectively facilitates transfer during cross-lingual fine-tuning. Notably, our results suggest that differences in orthographic word boundary conventions (the morphological granularity of written words) may impede cross-lingual transfer more significantly than linguistic unrelatedness. Our study confirms that decisions around subword modelling can be key to optimising the benefits of multilingual modelling.

A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation

TL;DR

Abstract

Paper Structure (13 sections, 1 equation, 6 figures, 3 tables)

This paper contains 13 sections, 1 equation, 6 figures, 3 tables.

Introduction
Related Work
Methodology
Multilingual Modelling
Cross-Lingual Finetuning
Experimental Setup
Training
Results & Discussion
Which subwords promote synergy and minimise interference?
Which subwords transfer cross-lingually?
What is the role of linguistic typology?
Conclusion
Model Configurations

Figures (6)

Figure 1: Performance increase for English$\rightarrow$Siswati through multilingual modelling varies greatly across subword methods and linguistic contexts.
Figure 1: We vary the language modelled alongside Siswati to control relatedness, morphology, and orthography.
Figure 2: Multilingual experimental setup: bilingual and trilingual models (bilingual OBPE is equivalent to BPE).
Figure 2: Performance change for en$\rightarrow$xh/af/ts through multilingual modelling alongside en$\rightarrow$ss.
Figure 3: Test set chrF++ of trilingual models.
...and 1 more figures

A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation

TL;DR

Abstract

A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)