Adjusted Scores for Discrete Langevin Algorithms

Armand Gissler; Saeed Saremi; Francis Bach

Adjusted Scores for Discrete Langevin Algorithms

Armand Gissler, Saeed Saremi, Francis Bach

TL;DR

This work reframes discrete sampling on the hypercube as discretized continuous-time dynamics and introduces Gibbs and Glauber scores to drive four score-based schemes (DULA, DUPS, DMALA, DMAPS). By relating these algorithms to Glauber dynamics and analyzing their contraction and bias, it establishes conditions under which unadjusted and Metropolis-adjusted variants converge to the target $p(\cdot)$ with favorable rates. Theoretical results show small-step guarantees and zero-bias behavior for the Glauber score, while experiments across independent bits, Ising, and Curie–Weiss models demonstrate practical advantages over traditional Gibbs sampling, particularly for moderate-to-large step-sizes. This work thus provides a principled foundation for discrete Langevin methods with provable convergence properties and practical efficacy in discrete probabilistic models.

Abstract

Sampling from discrete distributions is a ubiquitous task in machine learning, recently revisited by the emergence of discrete diffusion models. While Langevin algorithms constitute the state of the art for continuous spaces, discrete versions lack similar theoretical guarantees when the step-size becomes small. In this paper, we address this limitation by interpreting discrete sampling algorithms as discretizations of continuous-time dynamics on the hypercube. In particular, we describe several score functions for discrete algorithms which result in approximations of Glauber dynamics for the correct target distribution. We also compute upper bounds for the contraction of these algorithms, with or without Metropolis adjustment.

Adjusted Scores for Discrete Langevin Algorithms

TL;DR

with favorable rates. Theoretical results show small-step guarantees and zero-bias behavior for the Glauber score, while experiments across independent bits, Ising, and Curie–Weiss models demonstrate practical advantages over traditional Gibbs sampling, particularly for moderate-to-large step-sizes. This work thus provides a principled foundation for discrete Langevin methods with provable convergence properties and practical efficacy in discrete probabilistic models.

Abstract

Paper Structure (25 sections, 12 theorems, 80 equations, 3 figures)

This paper contains 25 sections, 12 theorems, 80 equations, 3 figures.

Introduction
Contributions.
Glauber dynamics and convergence of Gibbs sampling
Convergence of discrete unadjusted samplers
Contraction and approximation error of DULA
Convergence of a proximal sampler
Effect of a Metropolis adjustment on discrete Langevin algorithms
Convergence to the invariant probability distribution
Contraction Property
Experiments
(Mixture of) independent bits.
Ising model.
Curie-Weiss model.
Conclusion
Proof of \ref{['t:gibbs']}
...and 10 more sections

Key Result

Theorem 1

We have the following contraction property when $d\beta_2\leqslant1$ and $e^{-2/\eta}\leqslant1/d$:

Figures (3)

Figure 1: Left: Distance to the target for DULA, DUPS and Gibbs sampling in the mixture model. Center: Mixing times for DULA, DUPS and Gibbs sampling in the mixture model. Right: Mixing times for DMALA, DMAPS and Gibbs sampling in the mixture model.
Figure 2: Distance to the target and mixing times for DULA, DUPS and Gibbs sampling in the Ising model.
Figure 3: Distance to the target and mixing times for DULA, DUPS and Gibbs sampling in the Curie-Weiss model.

Theorems & Definitions (15)

Theorem 1: Contraction of Gibbs sampling
Theorem 2
Theorem 3
Theorem 4
Theorem 5
Theorem 6
Theorem 7
Theorem 8
Theorem 9
proof
...and 5 more

Adjusted Scores for Discrete Langevin Algorithms

TL;DR

Abstract

Adjusted Scores for Discrete Langevin Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (15)