Structured Voronoi Sampling

Afra Amini; Li Du; Ryan Cotterell

Structured Voronoi Sampling

Afra Amini, Li Du, Ryan Cotterell

TL;DR

Structured Voronoi Sampling (SVS) provides a principled gradient-based framework for text generation by encoding discrete language-model distributions as densities over embeddings via $$(\mathcal{K}, \mu)$$-Voronoi measures and sampling with a refractive Hamiltonian Monte Carlo, addressing discontinuities with reflection/refraction. It extends to structured sequences and controlled generation through $p_V(\mathbf{V})$ and $p_V(\mathbf{V}|t)$, with a base-measure design that yields a tractable gradient $\nabla_{\mathbf{x}} \log p_V(\mathbf{x})$. The authors prove detailed balance for the SVS sampler and demonstrate empirical advantages on toy distributions, language-model sampling, and controlled-generation tasks over baselines like MuCoLa, fudge, and Langevin, particularly in distributional fidelity and adherence to control targets while maintaining fluency and diversity. The work advances a theoretically grounded MCMC approach for discrete text generation and lays groundwork for principled, controllable generation with potential practical impact in safer and more faithful LM applications.

Abstract

Gradient-based sampling algorithms have demonstrated their effectiveness in text generation, especially in the context of controlled text generation. However, there exists a lack of theoretically grounded and principled approaches for this task. In this paper, we take an important step toward building a principled approach for sampling from language models with gradient-based methods. We use discrete distributions given by language models to define densities and develop an algorithm based on Hamiltonian Monte Carlo to sample from them. We name our gradient-based technique Structured Voronoi Sampling (SVS). In an experimental setup where the reference distribution is known, we show that the empirical distribution of SVS samples is closer to the reference distribution compared to alternative sampling schemes. Furthermore, in a controlled generation task, SVS is able to generate fluent and diverse samples while following the control targets significantly better than other methods.

Structured Voronoi Sampling

TL;DR

Structured Voronoi Sampling (SVS) provides a principled gradient-based framework for text generation by encoding discrete language-model distributions as densities over embeddings via

-Voronoi measures and sampling with a refractive Hamiltonian Monte Carlo, addressing discontinuities with reflection/refraction. It extends to structured sequences and controlled generation through

and

, with a base-measure design that yields a tractable gradient

. The authors prove detailed balance for the SVS sampler and demonstrate empirical advantages on toy distributions, language-model sampling, and controlled-generation tasks over baselines like MuCoLa, fudge, and Langevin, particularly in distributional fidelity and adherence to control targets while maintaining fluency and diversity. The work advances a theoretically grounded MCMC approach for discrete text generation and lays groundwork for principled, controllable generation with potential practical impact in safer and more faithful LM applications.

Abstract

Paper Structure (63 sections, 13 theorems, 47 equations, 8 figures, 7 tables, 6 algorithms)

This paper contains 63 sections, 13 theorems, 47 equations, 8 figures, 7 tables, 6 algorithms.

Introduction
Language Models
Language Modeling with Embeddings
Controlled Language Modeling with Embeddings
Voronoi Measures
Structured Voronoi Measures
Application to Text Generation
Base Measure.
Gradient-Based Sampling
Hamiltonian Monte Carlo
Langevin Dynamics.
Applications to Controlled Generation
MuCoLa.
cold and other schemes.
Structured Voronoi Sampling
...and 48 more sections

Key Result

Proposition 1

Let $\boldsymbol{p} = [p_1, \ldots, p_M]$ be an embedding-augmented distribution with embeddings $\{\mathbf{v}_m\}_{m=1}^M \subset \mathbb{R}^d$, and let $p_{\mathrm{V}}$ be the corresponding Voronoi measure eq:voronoi-measure. Then, $p_{\mathrm{V}}(C_m) = p_m$ where $C_m$ is defined as in eq:vorono

Figures (8)

Figure 1: The example shows how Voronoi Sampling navigates through the space to sample one embedding in $\mathbb{R}^2$. Each Voronoi is annotated with the probability of its center, i.e., Voronoi measure of the cell.
Figure 2: Left: JS divergence between the reference probability and empirical probability distribution. Voronoi Sampling clearly outperforms others in low temperatures. Right: reference probability distribution annealed with $3$ temperatures: $0.25$ (peaked), $1$, and $2$ (close to uniform).
Figure 3: Perplexity of $100$ samples taken with different gradient-based algorithms, compared to $1000$ samples taken with the ancestral sampling (in green). While langevin, svs, and MuCoLa are comparably close to the ancestral samples' distribution, svs models the tail of the distribution better.
Figure 4: Comparing JS divergence of methods using different numbers of iterations. In general, Voronoi sampling converge to the true distribution faster compared to hmc and MuCoLa. As the number of iterations increases, the divergence between the samples' distribution and the true distribution decreases across all sampling methods.
Figure 5: Comparing the distribution of sampled elements at temperature $0.25$. With $100$ iterations, MuCoLa undersamples the element with the highest probability while oversampling other elements.
...and 3 more figures

Theorems & Definitions (26)

Definition 1
Definition 2
Proposition 1
Example 1
Proposition 2
Definition 3
Proposition 3
Proposition 4
Proposition 4
proof
...and 16 more

Structured Voronoi Sampling

TL;DR

Abstract

Structured Voronoi Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (26)