Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation
Zhi Qu, Chenchen Ding, Taro Watanabe
TL;DR
The paper investigates why zero-shot translations underperform in multilingual NMT by analyzing how representations transfer across languages. It introduces identity pairs as a base measure and demonstrates that the encoder transfers source-language representations into the target-language subspace, causing entanglements that hinder zero-shot transfer. To address this, the authors propose Low-Rank Language-specific Embedding (LoLE) and Language-specific Contrastive Learning of Representations (LCLR), which substantially improve zero-shot performance across three benchmarks without degrading supervised directions. The findings offer a practical pathway to enhance multilingual representation transfer in MNMT and deepen understanding of how language information is represented and disentangled in encoder-decoder architectures.
Abstract
Understanding representation transfer in multilingual neural machine translation (MNMT) can reveal the reason for the zero-shot translation deficiency. In this work, we systematically analyze the representational issue of MNMT models. We first introduce the identity pair, translating a sentence to itself, to address the lack of the base measure in multilingual investigations, as the identity pair can reflect the representation of a language within the model. Then, we demonstrate that the encoder transfers the source language to the representational subspace of the target language instead of the language-agnostic state. Thus, the zero-shot translation deficiency arises because the representation of a translation is entangled with other languages and not transferred to the target language effectively. Based on our findings, we propose two methods: 1) low-rank language-specific embedding at the encoder, and 2) language-specific contrastive learning of the representation at the decoder. The experimental results on Europarl-15, TED-19, and OPUS-100 datasets show that our methods substantially enhance the performance of zero-shot translations without sacrifices in supervised directions by improving language transfer capacity, thereby providing practical evidence to support our conclusions. Codes are available at https://github.com/zhiqu22/ZeroTrans.
