Principled Gradient-based Markov Chain Monte Carlo for Text Generation

Li Du; Afra Amini; Lucas Torroba Hennigen; Xinyan Velocity Yu; Jason Eisner; Holden Lee; Ryan Cotterell

Principled Gradient-based Markov Chain Monte Carlo for Text Generation

Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Jason Eisner, Holden Lee, Ryan Cotterell

TL;DR

This paper proposes several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly, and proposes several that are faithful, meaning that they have the target text distribution as its limiting distribution.

Abstract

Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence. However, as we show in this paper, previous attempts on this approach to text generation all fail to sample correctly from the target language model distributions. To address this limitation, we consider the problem of designing text samplers that are faithful, meaning that they have the target text distribution as its limiting distribution. We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly, and study their theoretical properties. Through experiments on various forms of text generation, we demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better.

Principled Gradient-based Markov Chain Monte Carlo for Text Generation

TL;DR

Abstract

Paper Structure (45 sections, 6 theorems, 61 equations, 3 figures, 2 tables)

This paper contains 45 sections, 6 theorems, 61 equations, 3 figures, 2 tables.

Introduction
Energy-based Models of Text
Text Generation as MCMC
Sampling from EBMs
Gradient-based Sampling through Relaxation
Faithfulness of Gradient-based Text Samplers
qin2022cold.
kumar-2022.
amini2023structured.
Background: MCMC
Metropolis--Hastings Acceptance.
Mixing Time.
Faithful Gradient-based Text Generation
A Langevin-based Sampler
Properties of $p$-NCG
...and 30 more sections

Key Result

theorem 1

Let $\pi(\vx)$ be a discrete log-quadratic distribution as defined in def:log-quad. For any $\alpha>0$, there exists a unique distribution $\pi_\alpha(\vx)$ such that the Markov chain defined by $q$ in eq:pncg-final-form is reversible with respect to $\pi_\alpha$. Further, $\pi_\alpha \to \pi$ weakl

Figures (3)

Figure 1: Total variation distance between $\pi_\mathrm{mcmc}$, the limiting distribution of MCMC algorithms from previous works, and $\pi_\toy$, the toy language model distribution from \ref{['ex:toy-lm']}. $\pi_\mathrm{mcmc}$ is computed with spectral decomposition when possible. We can observe that the limiting distribution of is far from the target distribution, and , depending on its step size $\alpha$, may be close to the target distribution. Nevertheless, it does not have the correct distribution for any $\alpha$.
Figure 2: Total variation distance between the empirical distribution of different samplers (at different steps) and $\pi_\toy$, the true distribution of the toy language model from \ref{['ex:toy-lm']}.
Figure 3: Energy traces of different samplers when sampling from GPT-2. We observe that the faithful samplers ( and GwL) converges to unbiased estimate of energy (estimated using ancestral sampling). On the other hand, the energy of the chain drops initially but suffers from systematic bias and is unable to converge to the true energy distribution, similar to the conclusion in our exact analysis in \ref{['ex:toy-lm']}.

Theorems & Definitions (11)

definition 1
theorem 1
proof : Proof Idea
theorem 2
proof : Proof Idea
theorem 2
proof
theorem 3: disc theorem; Theorem 6.1.1 in horn2013
theorem 3
proof
...and 1 more

Principled Gradient-based Markov Chain Monte Carlo for Text Generation

TL;DR

Abstract

Principled Gradient-based Markov Chain Monte Carlo for Text Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (11)