Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

Zhi Qu; Chenchen Ding; Taro Watanabe

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

Zhi Qu, Chenchen Ding, Taro Watanabe

TL;DR

The paper investigates why zero-shot translations underperform in multilingual NMT by analyzing how representations transfer across languages. It introduces identity pairs as a base measure and demonstrates that the encoder transfers source-language representations into the target-language subspace, causing entanglements that hinder zero-shot transfer. To address this, the authors propose Low-Rank Language-specific Embedding (LoLE) and Language-specific Contrastive Learning of Representations (LCLR), which substantially improve zero-shot performance across three benchmarks without degrading supervised directions. The findings offer a practical pathway to enhance multilingual representation transfer in MNMT and deepen understanding of how language information is represented and disentangled in encoder-decoder architectures.

Abstract

Understanding representation transfer in multilingual neural machine translation (MNMT) can reveal the reason for the zero-shot translation deficiency. In this work, we systematically analyze the representational issue of MNMT models. We first introduce the identity pair, translating a sentence to itself, to address the lack of the base measure in multilingual investigations, as the identity pair can reflect the representation of a language within the model. Then, we demonstrate that the encoder transfers the source language to the representational subspace of the target language instead of the language-agnostic state. Thus, the zero-shot translation deficiency arises because the representation of a translation is entangled with other languages and not transferred to the target language effectively. Based on our findings, we propose two methods: 1) low-rank language-specific embedding at the encoder, and 2) language-specific contrastive learning of the representation at the decoder. The experimental results on Europarl-15, TED-19, and OPUS-100 datasets show that our methods substantially enhance the performance of zero-shot translations without sacrifices in supervised directions by improving language transfer capacity, thereby providing practical evidence to support our conclusions. Codes are available at https://github.com/zhiqu22/ZeroTrans.

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

TL;DR

Abstract

Paper Structure (37 sections, 5 equations, 14 figures, 5 tables)

This paper contains 37 sections, 5 equations, 14 figures, 5 tables.

Introduction
Background
Multilingual Neural Machine Translation
The Discrepancy in Prior Works
Investigating Representation Transfer in MNMT
Identity Pairs
Language Transfer Within the Encoder
Entanglements Hindering the Transfer
Language Features in the Decoder
Encouraging Representation Transfer
Low-Rank Embedding for the Encoder
Contrastive Learning for the Decoder
Experiments
Setup
Datasets
...and 22 more sections

Figures (14)

Figure 1: Different analytical methods lead to different conclusions. \ref{['fig:cluster']} means the target language family clusters the representations of translations from English (en) to other languages through the encoder. \ref{['fig:align']} indicates the encoder semantically aligns different source languages. Language codes in this work follow ISO 639-1, and Appendix \ref{['appendix:introduction']} provides details of those figures.
Figure 2: Visualizations of layer-wise SVCCA scores for the encoder. (①, ②) indicate the source language and target language, respectively. The analyzed models have 6 encoder layers, and the analysis based on models with 8 and 10 encoder layers is shown in Figure \ref{['fig:supplement']}.
Figure 3: Visualizations of layer-wise SVCCA scores for the encoders with 8 and 10 layers on diverse languages in Europarl-15, as a comparison of Figure \ref{['fig:transfer']} to prove the generalization.
Figure 4: t-SNE plot of the token-level alignment between en$\to$en and x$\to$en in TED-19. Each point is a token's representation collected from the output of the encoder. Representations of different tokens are clustered by the semantics, which are denoted by English phrases, where the overall variance is 0.09. Appendix \ref{['appendix:alignment']} shows the more details.
Figure 5: Visualizations for the encoder's output by t-SNE and BiKDE. \ref{['fig:entangle_1']}, \ref{['fig:entangle_2']} and \ref{['fig:entangle_5']} are measured in Europarl-15. \ref{['fig:entangle_3']}, \ref{['fig:entangle_4']} and \ref{['fig:entangle_6']} are measured in TED-19.
...and 9 more figures

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

TL;DR

Abstract

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (14)