Table of Contents
Fetching ...

Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features

Mengyu Bu, Shuhao Gu, Yang Feng

TL;DR

This work tackles zero-shot and supervised translation in many-to-many multilingual NMT by explicitly disentangling semantic information in the encoder from language-specific linguistic features and by injecting low-level linguistic cues into the decoder via a linguistic encoder. The proposed disentangler learns a universal semantic space while a two-layer linguistic encoder, combined with a fusion module, guides target-language generation, achieving significant zero-shot gains without sacrificing supervised performance. Empirical results on IWSLT2017, OPUS-7, and PC-6 show notable improvements in zero-shot BLEU and reduced off-target rates, supported by ablations, visualizations, and case studies. The approach offers practical implications for robust multilingual translation and cross-lingual knowledge transfer, with code released for reproducibility.

Abstract

The many-to-many multilingual neural machine translation can be regarded as the process of integrating semantic features from the source sentences and linguistic features from the target sentences. To enhance zero-shot translation, models need to share knowledge across languages, which can be achieved through auxiliary tasks for learning a universal representation or cross-lingual mapping. To this end, we propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation. On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features, thus facilitating knowledge transfer while preserving complete information. On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation. Experimental results on multilingual datasets demonstrate significant improvement in zero-shot translation compared to the baseline system, while maintaining performance in supervised translation. Further analysis validates the effectiveness of our method in leveraging both semantic and linguistic features. The code is available at https://github.com/ictnlp/SemLing-MNMT.

Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features

TL;DR

This work tackles zero-shot and supervised translation in many-to-many multilingual NMT by explicitly disentangling semantic information in the encoder from language-specific linguistic features and by injecting low-level linguistic cues into the decoder via a linguistic encoder. The proposed disentangler learns a universal semantic space while a two-layer linguistic encoder, combined with a fusion module, guides target-language generation, achieving significant zero-shot gains without sacrificing supervised performance. Empirical results on IWSLT2017, OPUS-7, and PC-6 show notable improvements in zero-shot BLEU and reduced off-target rates, supported by ablations, visualizations, and case studies. The approach offers practical implications for robust multilingual translation and cross-lingual knowledge transfer, with code released for reproducibility.

Abstract

The many-to-many multilingual neural machine translation can be regarded as the process of integrating semantic features from the source sentences and linguistic features from the target sentences. To enhance zero-shot translation, models need to share knowledge across languages, which can be achieved through auxiliary tasks for learning a universal representation or cross-lingual mapping. To this end, we propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation. On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features, thus facilitating knowledge transfer while preserving complete information. On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation. Experimental results on multilingual datasets demonstrate significant improvement in zero-shot translation compared to the baseline system, while maintaining performance in supervised translation. Further analysis validates the effectiveness of our method in leveraging both semantic and linguistic features. The code is available at https://github.com/ictnlp/SemLing-MNMT.
Paper Structure (41 sections, 8 equations, 4 figures, 10 tables)

This paper contains 41 sections, 8 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: The framework of our method. We propose a disentangler to learn a universal semantic representation via disentangling and utilize a linguistic encoder to fuse low-level linguistic features.
  • Figure 2: The architecture of disentangler. $\bigoplus$ denotes the summation of the feature dimensions.
  • Figure 3: Visualization of the m-Transformer, mRASP2 w/o AA and our system after dimension reduction. The subfigure captions describe the model modules, and we do dimension reduction on the outputs of these modules. The blue line denotes German (De), the orange line denotes Italian (It), the green line denotes Dutch (Nl), and the purple line denotes Romanian (Ro).
  • Figure 4: Case study of zero-shot translation. We identify three off-target types which are categorized by off-target position and ratio.