Table of Contents
Fetching ...

Adjusted Scores for Discrete Langevin Algorithms

Armand Gissler, Saeed Saremi, Francis Bach

TL;DR

This work reframes discrete sampling on the hypercube as discretized continuous-time dynamics and introduces Gibbs and Glauber scores to drive four score-based schemes (DULA, DUPS, DMALA, DMAPS). By relating these algorithms to Glauber dynamics and analyzing their contraction and bias, it establishes conditions under which unadjusted and Metropolis-adjusted variants converge to the target $p(\cdot)$ with favorable rates. Theoretical results show small-step guarantees and zero-bias behavior for the Glauber score, while experiments across independent bits, Ising, and Curie–Weiss models demonstrate practical advantages over traditional Gibbs sampling, particularly for moderate-to-large step-sizes. This work thus provides a principled foundation for discrete Langevin methods with provable convergence properties and practical efficacy in discrete probabilistic models.

Abstract

Sampling from discrete distributions is a ubiquitous task in machine learning, recently revisited by the emergence of discrete diffusion models. While Langevin algorithms constitute the state of the art for continuous spaces, discrete versions lack similar theoretical guarantees when the step-size becomes small. In this paper, we address this limitation by interpreting discrete sampling algorithms as discretizations of continuous-time dynamics on the hypercube. In particular, we describe several score functions for discrete algorithms which result in approximations of Glauber dynamics for the correct target distribution. We also compute upper bounds for the contraction of these algorithms, with or without Metropolis adjustment.

Adjusted Scores for Discrete Langevin Algorithms

TL;DR

This work reframes discrete sampling on the hypercube as discretized continuous-time dynamics and introduces Gibbs and Glauber scores to drive four score-based schemes (DULA, DUPS, DMALA, DMAPS). By relating these algorithms to Glauber dynamics and analyzing their contraction and bias, it establishes conditions under which unadjusted and Metropolis-adjusted variants converge to the target with favorable rates. Theoretical results show small-step guarantees and zero-bias behavior for the Glauber score, while experiments across independent bits, Ising, and Curie–Weiss models demonstrate practical advantages over traditional Gibbs sampling, particularly for moderate-to-large step-sizes. This work thus provides a principled foundation for discrete Langevin methods with provable convergence properties and practical efficacy in discrete probabilistic models.

Abstract

Sampling from discrete distributions is a ubiquitous task in machine learning, recently revisited by the emergence of discrete diffusion models. While Langevin algorithms constitute the state of the art for continuous spaces, discrete versions lack similar theoretical guarantees when the step-size becomes small. In this paper, we address this limitation by interpreting discrete sampling algorithms as discretizations of continuous-time dynamics on the hypercube. In particular, we describe several score functions for discrete algorithms which result in approximations of Glauber dynamics for the correct target distribution. We also compute upper bounds for the contraction of these algorithms, with or without Metropolis adjustment.
Paper Structure (25 sections, 12 theorems, 80 equations, 3 figures)

This paper contains 25 sections, 12 theorems, 80 equations, 3 figures.

Key Result

Theorem 1

We have the following contraction property when $d\beta_2\leqslant1$ and $e^{-2/\eta}\leqslant1/d$:

Figures (3)

  • Figure 1: Left: Distance to the target for DULA, DUPS and Gibbs sampling in the mixture model. Center: Mixing times for DULA, DUPS and Gibbs sampling in the mixture model. Right: Mixing times for DMALA, DMAPS and Gibbs sampling in the mixture model.
  • Figure 2: Distance to the target and mixing times for DULA, DUPS and Gibbs sampling in the Ising model.
  • Figure 3: Distance to the target and mixing times for DULA, DUPS and Gibbs sampling in the Curie-Weiss model.

Theorems & Definitions (15)

  • Theorem 1: Contraction of Gibbs sampling
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Theorem 8
  • Theorem 9
  • proof
  • ...and 5 more