Table of Contents
Fetching ...

Unlocking the Power of GANs in Non-Autoregressive Text Generation

Da Ren, Yi Cai, Qing Li

TL;DR

A GAN-based NAR model, Adversarial Non-autoregressive Transformer (ANT), is proposed and the experimental results demonstrate that ANT can achieve comparable performance with mainstream models in a single forward pass and has great potential in various applications like latent interpolation and semi-supervised learning.

Abstract

Generative Adversarial Networks (GANs) have been studied in text generation to tackle the exposure bias problem. Despite their remarkable development, they adopt autoregressive structures so suffering from high latency in both training and inference stages. Although GANs have potential to support efficient generation by adopting non-autoregressive (NAR) structures, their explorations in NAR models are extremely limited. In this work, we conduct pioneering study of building language GANs based on NAR structures. We identify two issues that constrain the performance of GAN-based NAR models. Firstly, existing methods of incorporating latent variables provide highly similar representations which cannot describe the diversity of different words in sentences. We tackle this problem by proposing Position-Aware Self-Modulation, providing more diverse and effective representations. Secondly, the attention mechanism in Transformer cannot accurately build word dependencies in the unstable training of GANs, and we adopt Dependency Feed Forward Network to enhance the model capacity in dependency modeling. Armed with these two facilities, we propose a GAN-based NAR model, Adversarial Non-autoregressive Transformer (ANT). The experimental results demonstrate that ANT can achieve comparable performance with mainstream models in a single forward pass and has great potential in various applications like latent interpolation and semi-supervised learning.

Unlocking the Power of GANs in Non-Autoregressive Text Generation

TL;DR

A GAN-based NAR model, Adversarial Non-autoregressive Transformer (ANT), is proposed and the experimental results demonstrate that ANT can achieve comparable performance with mainstream models in a single forward pass and has great potential in various applications like latent interpolation and semi-supervised learning.

Abstract

Generative Adversarial Networks (GANs) have been studied in text generation to tackle the exposure bias problem. Despite their remarkable development, they adopt autoregressive structures so suffering from high latency in both training and inference stages. Although GANs have potential to support efficient generation by adopting non-autoregressive (NAR) structures, their explorations in NAR models are extremely limited. In this work, we conduct pioneering study of building language GANs based on NAR structures. We identify two issues that constrain the performance of GAN-based NAR models. Firstly, existing methods of incorporating latent variables provide highly similar representations which cannot describe the diversity of different words in sentences. We tackle this problem by proposing Position-Aware Self-Modulation, providing more diverse and effective representations. Secondly, the attention mechanism in Transformer cannot accurately build word dependencies in the unstable training of GANs, and we adopt Dependency Feed Forward Network to enhance the model capacity in dependency modeling. Armed with these two facilities, we propose a GAN-based NAR model, Adversarial Non-autoregressive Transformer (ANT). The experimental results demonstrate that ANT can achieve comparable performance with mainstream models in a single forward pass and has great potential in various applications like latent interpolation and semi-supervised learning.
Paper Structure (21 sections, 5 equations, 6 figures, 3 tables)

This paper contains 21 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Structure of Adversarial Non-autoregressive Transformer (ANT)
  • Figure 2: Cosine similarity of the output from (a) Self-Modulation; and (b) Position-Aware Self-Modulation.
  • Figure 3: Proposed Facilities in ANT.
  • Figure 4: Additional Experimental Results. (a) Model Performance at Various Temperature. (b) Least Coverage Rate.
  • Figure 5: Ablation study of (a) Dependency FFN, and (b) Position-Aware Self-Modulation.
  • ...and 1 more figures