Table of Contents
Fetching ...

Unsupervised Neural Machine Translation with Weight Sharing

Zhen Yang, Wei Chen, Feng Wang, Bo Xu

TL;DR

This work tackles unsupervised neural machine translation by challenging the standard single shared encoder approach. It introduces a dual-encoder/dual-decoder architecture with partial weight sharing, embedding reinforcement, and directional self-attention, augmented by local and global GANs to align cross-language representations and outputs. Through denoising auto-encoding, back-translation, and adversarial training in a two-stage process, the method achieves notable improvements on English-German, English-French, and Chinese-to-English, illustrating that preserving language-specific features while sharing a high-level latent space is beneficial. The results demonstrate a meaningful step toward effective unsupervised NMT with practical implications for low-resource languages, while also outlining avenues for incorporating language models and syntactic information in future work.

Abstract

Unsupervised neural machine translation (NMT) is a recently proposed approach for machine translation which aims to train the model without using any labeled data. The models proposed for unsupervised NMT often use only one shared encoder to map the pairs of sentences from different languages to a shared-latent space, which is weak in keeping the unique and internal characteristics of each language, such as the style, terminology, and sentence structure. To address this issue, we introduce an extension by utilizing two independent encoders but sharing some partial weights which are responsible for extracting high-level representations of the input sentences. Besides, two different generative adversarial networks (GANs), namely the local GAN and global GAN, are proposed to enhance the cross-language translation. With this new approach, we achieve significant improvements on English-German, English-French and Chinese-to-English translation tasks.

Unsupervised Neural Machine Translation with Weight Sharing

TL;DR

This work tackles unsupervised neural machine translation by challenging the standard single shared encoder approach. It introduces a dual-encoder/dual-decoder architecture with partial weight sharing, embedding reinforcement, and directional self-attention, augmented by local and global GANs to align cross-language representations and outputs. Through denoising auto-encoding, back-translation, and adversarial training in a two-stage process, the method achieves notable improvements on English-German, English-French, and Chinese-to-English, illustrating that preserving language-specific features while sharing a high-level latent space is beneficial. The results demonstrate a meaningful step toward effective unsupervised NMT with practical implications for low-resource languages, while also outlining avenues for incorporating language models and syntactic information in future work.

Abstract

Unsupervised neural machine translation (NMT) is a recently proposed approach for machine translation which aims to train the model without using any labeled data. The models proposed for unsupervised NMT often use only one shared encoder to map the pairs of sentences from different languages to a shared-latent space, which is weak in keeping the unique and internal characteristics of each language, such as the style, terminology, and sentence structure. To address this issue, we introduce an extension by utilizing two independent encoders but sharing some partial weights which are responsible for extracting high-level representations of the input sentences. Besides, two different generative adversarial networks (GANs), namely the local GAN and global GAN, are proposed to enhance the cross-language translation. With this new approach, we achieve significant improvements on English-German, English-French and Chinese-to-English translation tasks.

Paper Structure

This paper contains 18 sections, 12 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The architecture of the proposed model. We implement the shared-latent space assumption using a weight sharing constraint where the connection of the last few layers in $Enc_s$ and $Enc_t$ are tied (illustrated with dashed lines) and the connection of the first few layers in $Dec_s$ and $Dec_t$ are tied. $\tilde{x}_s^{Enc_s-Dec_s}$ and $\tilde{x}_t^{Enc_t-Dec_t}$ are self-reconstructed sentences in each language. $\tilde{x}_s^{Enc_s-Dec_t}$ is the translated sentence from source to target and $\tilde{x}_t^{Enc_t-Dec_s}$ is the translation in reversed direction. $D_l$ is utilized to assess whether the hidden representation of the encoder is from the source or target language. $D_{g1}$ and $D_{g2}$ are used to evaluate whether the translated sentences are realistic for each language respectively. $Z$ represents the shared-latent space.
  • Figure 2: The effects of the weight-sharing layer number on English-to-German, English-to-French and Chinese-to-English translation tasks.