Table of Contents
Fetching ...

Discrete Neural Flow Samplers with Locally Equivariant Transformer

Zijing Ou, Ruixiang Zhang, Yingzhen Li

TL;DR

Discrete Neural Flow Samplers (DNFS) address sampling from unnormalised discrete distributions by learning a CTMC rate matrix $R_t$ that transports a prior to the target along an annealing path while enforcing the Kolmogorov forward equation. To cope with the intractable partition function $Z_t$, the approach employs control variates and a coordinate-descent learning scheme, and it reduces computational cost using locally equivariant networks that implement a one-way rate matrix via a Locally Equivariant Transformer (leTF). The method is demonstrated on sampling from unnormalised distributions, training discrete energy-based models, and solving combinatorial optimisation problems, with graph-aware extensions (leGF) enabling COPs like MIS and MaxCut. DNFS provides competitive sampling quality, enables end-to-end training of EBMs with neural samplers, and offers a scalable, data-free alternative to diffusion-inspired discrete samplers, with potential for MCMC-refinement and broader discrete-domain applicability.

Abstract

Sampling from unnormalised discrete distributions is a fundamental problem across various domains. While Markov chain Monte Carlo offers a principled approach, it often suffers from slow mixing and poor convergence. In this paper, we propose Discrete Neural Flow Samplers (DNFS), a trainable and efficient framework for discrete sampling. DNFS learns the rate matrix of a continuous-time Markov chain such that the resulting dynamics satisfy the Kolmogorov equation. As this objective involves the intractable partition function, we then employ control variates to reduce the variance of its Monte Carlo estimation, leading to a coordinate descent learning algorithm. To further facilitate computational efficiency, we propose locally equivaraint Transformer, a novel parameterisation of the rate matrix that significantly improves training efficiency while preserving powerful network expressiveness. Empirically, we demonstrate the efficacy of DNFS in a wide range of applications, including sampling from unnormalised distributions, training discrete energy-based models, and solving combinatorial optimisation problems.

Discrete Neural Flow Samplers with Locally Equivariant Transformer

TL;DR

Discrete Neural Flow Samplers (DNFS) address sampling from unnormalised discrete distributions by learning a CTMC rate matrix that transports a prior to the target along an annealing path while enforcing the Kolmogorov forward equation. To cope with the intractable partition function , the approach employs control variates and a coordinate-descent learning scheme, and it reduces computational cost using locally equivariant networks that implement a one-way rate matrix via a Locally Equivariant Transformer (leTF). The method is demonstrated on sampling from unnormalised distributions, training discrete energy-based models, and solving combinatorial optimisation problems, with graph-aware extensions (leGF) enabling COPs like MIS and MaxCut. DNFS provides competitive sampling quality, enables end-to-end training of EBMs with neural samplers, and offers a scalable, data-free alternative to diffusion-inspired discrete samplers, with potential for MCMC-refinement and broader discrete-domain applicability.

Abstract

Sampling from unnormalised discrete distributions is a fundamental problem across various domains. While Markov chain Monte Carlo offers a principled approach, it often suffers from slow mixing and poor convergence. In this paper, we propose Discrete Neural Flow Samplers (DNFS), a trainable and efficient framework for discrete sampling. DNFS learns the rate matrix of a continuous-time Markov chain such that the resulting dynamics satisfy the Kolmogorov equation. As this objective involves the intractable partition function, we then employ control variates to reduce the variance of its Monte Carlo estimation, leading to a coordinate descent learning algorithm. To further facilitate computational efficiency, we propose locally equivaraint Transformer, a novel parameterisation of the rate matrix that significantly improves training efficiency while preserving powerful network expressiveness. Empirically, we demonstrate the efficacy of DNFS in a wide range of applications, including sampling from unnormalised distributions, training discrete energy-based models, and solving combinatorial optimisation problems.

Paper Structure

This paper contains 38 sections, 8 theorems, 74 equations, 17 figures, 8 tables, 2 algorithms.

Key Result

Proposition 1

For a rate matrix $R_t$ that generates the probabilistic path $p_t$, there exists a one-way rate matrix $Q_t (y,x)=\left[R_t(y,x) - R_t(x,y) \frac{p_t(y)}{p_t(x)}\right]_+$ if $y\ne x$ and $Q_t (x,x) = \sum_{y\ne x} Q_t (y,x)$, that generates the same probabilistic path $p_t$, where $[z]_+ = \mathrm

Figures (17)

  • Figure 1: Comparison of std. dev. and training loss for different estimators of $\partial_t \log Z_t$. Lower variance estimator exhibits lower training loss, indicating a better learned rate matrix satisfying the Kolmogorov equation in \ref{['eq:dfs_loss']}.
  • Figure 2: Illustration of the leTF network.
  • Figure 3: Comparison of log RMSE $(\downarrow)$ and ESS $(\uparrow)$ for different locally equivariant networks. More expressive networks achieve better performance.
  • Figure 4: Comparison between different discrete samplers on pre-trained EBMs.
  • Figure 5: Comparison of effective sample size and histogram of sample energy on the lattice Ising model.
  • ...and 12 more figures

Theorems & Definitions (16)

  • Proposition 1
  • Proposition 2: Instantiation of Locally Equivariant Networks
  • Lemma 1: Discrete Stein Identity
  • proof
  • Lemma 2
  • proof
  • Definition 1: One-way Rate Matrix
  • Proposition 2
  • proof
  • Definition 2: Locally Equivariant Network
  • ...and 6 more