Table of Contents
Fetching ...

PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

TL;DR

PANNER is proposed, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over paragraphs to generate high-quality long-form text in an efficient manner.

Abstract

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation. This issue is often attributed to exposure bias - the difference between how a model is trained, and how it is used during inference. Denoising diffusion models provide an alternative approach in which a model can revisit and revise its output. However, they can be computationally expensive and prior efforts on text have led to models that produce less fluent output compared to autoregressive models, especially for longer text and paragraphs. In this paper, we propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over paragraphs. The model achieves this by combining an autoregressive "decoding" module with a "planning" module that uses latent diffusion to generate semantic paragraph embeddings in a coarse-to-fine manner. The proposed method is evaluated on various conditional generation tasks, and results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text in an efficient manner.

PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

TL;DR

PANNER is proposed, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over paragraphs to generate high-quality long-form text in an efficient manner.

Abstract

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation. This issue is often attributed to exposure bias - the difference between how a model is trained, and how it is used during inference. Denoising diffusion models provide an alternative approach in which a model can revisit and revise its output. However, they can be computationally expensive and prior efforts on text have led to models that produce less fluent output compared to autoregressive models, especially for longer text and paragraphs. In this paper, we propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over paragraphs. The model achieves this by combining an autoregressive "decoding" module with a "planning" module that uses latent diffusion to generate semantic paragraph embeddings in a coarse-to-fine manner. The proposed method is evaluated on various conditional generation tasks, and results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text in an efficient manner.
Paper Structure (44 sections, 3 equations, 7 figures, 15 tables)

This paper contains 44 sections, 3 equations, 7 figures, 15 tables.

Figures (7)

  • Figure 1: Left: With a repetitive prompt, the finetuned GPT-2 large model (774M, FT) is still attracted to self-reinforced repetition (highlighted text) even under top-p sampling (K=50, p=0.92). Right: the most frequent first n-grams of the generations for each method, derived from 512 generation roll-outs. Our proposed method results in a more diversified generation robust to the ill-composed prompt.
  • Figure 2: Model overview. Left: a variational paragraph embedder is learned to encode paragraph into a fixed amount of latent codes. Right: the latent diffusion model based on transformer block is applied to generate the latent codes. The decoder finally translates them into the text. (BOS: Begin of Sentence token, EOS: End of Sentence token)
  • Figure 3: Impact of the proportion of injected noise for learning Paragraph Embeddings on XSum dataset. Large substitution noise results in worse $\textbf{BLEU}_{\texttt{clean}}$ but better $\textbf{BLEU}_{\texttt{robust}}$ and $\textbf{PPL}_{\texttt{int}}$.
  • Figure 4: Impact of the proportion of injected noise for learning Paragraph Embeddings on XSum dataset. $\textbf{PPL}_{\texttt{int}}$ and the PPL of the generation obtained from training PLANNER on the corresponding ${\bm{z}}$ at different noise level.
  • Figure 5: The evolution of the metrics during CNN-DM summarization generation from PLANNER based on 256 samples. We observe a clear trend of improvement in the majority of the metrics as the generation process advances.
  • ...and 2 more figures