Discrete generative diffusion models without stochastic differential equations: a tensor network approach

Luke Causer; Grant M. Rotskoff; Juan P. Garrahan

Discrete generative diffusion models without stochastic differential equations: a tensor network approach

Luke Causer, Grant M. Rotskoff, Juan P. Garrahan

TL;DR

This work develops tensor-network–based discrete diffusion models (DDMs) to sample lattice systems with discrete degrees of freedom without solving stochastic differential equations. By representing both probability vectors $P_{m heta}$ and evolution operators as matrix product states/operators, the forward noising channel $oldsymbol{ ext{W}}$ and the reverse denoising channel $oldsymbol{ ext{W}}^{oldsymbol heta}_{ ilde t}$ can be implemented exactly within TN contractions, and sampling is achieved via autoregressive TN samplers. The authors integrate DDM proposals with MCMC through two update schemes—disconnected and connected—demonstrating that the connected variant yields higher acceptance and controllable correlation through the denoising time $T$. They apply the framework to the Fredkin spin chain and the 2D Ising model on a cylinder, showing how a learnable MPS $P_{m heta}$ can be optimized via negative log-likelihood and gradient-based methods to approximate target Boltzmann distributions efficiently across phases. This approach offers a scalable route to Boltzmann-like sampling on discrete lattices and motivates extensions to higher dimensions and other TN topologies (e.g., TTN, PEPS).

Abstract

Diffusion models (DMs) are a class of generative machine learning methods that sample a target distribution by transforming samples of a trivial (often Gaussian) distribution using a learned stochastic differential equation. In standard DMs, this is done by learning a ``score function'' that reverses the effect of adding diffusive noise to the distribution of interest. Here we consider the generalisation of DMs to lattice systems with discrete degrees of freedom, and where noise is added via Markov chain jump dynamics. We show how to use tensor networks (TNs) to efficiently define and sample such ``discrete diffusion models'' (DDMs) without explicitly having to solve a stochastic differential equation. We show the following: (i) by parametrising the data and evolution operators as TNs, the denoising dynamics can be represented exactly; (ii) the auto-regressive nature of TNs allows to generate samples efficiently and without bias; (iii) for sampling Boltzmann-like distributions, TNs allow to construct an efficient learning scheme that integrates well with Monte Carlo. We illustrate this approach to study the equilibrium of two models with non-trivial thermodynamics, the $d=1$ constrained Fredkin chain and the $d=2$ Ising model.

Discrete generative diffusion models without stochastic differential equations: a tensor network approach

TL;DR

and evolution operators as matrix product states/operators, the forward noising channel

and the reverse denoising channel

can be implemented exactly within TN contractions, and sampling is achieved via autoregressive TN samplers. The authors integrate DDM proposals with MCMC through two update schemes—disconnected and connected—demonstrating that the connected variant yields higher acceptance and controllable correlation through the denoising time

. They apply the framework to the Fredkin spin chain and the 2D Ising model on a cylinder, showing how a learnable MPS

can be optimized via negative log-likelihood and gradient-based methods to approximate target Boltzmann distributions efficiently across phases. This approach offers a scalable route to Boltzmann-like sampling on discrete lattices and motivates extensions to higher dimensions and other TN topologies (e.g., TTN, PEPS).

Abstract

constrained Fredkin chain and the

Ising model.

Paper Structure (24 sections, 48 equations, 17 figures)

This paper contains 24 sections, 48 equations, 17 figures.

Introduction
Tensor networks
Probability distributions as matrix product states
Generative Discrete Diffusion Protocols with Tensor Networks
Noising protocol
Denoising protocol
Noising-denoising as generative updates for MCMC
Disconnected update
The connected update
Monte Carlo sampling via DDMs with tensor networks
Models
Fredkin spin chain
2D Ising model on a cylinder
Sampling via denoising from an exact distribution
Sampling via denoising from an approximate distribution
...and 9 more sections

Figures (17)

Figure 1: Tensor networks. Diagrammatic representation of tensor networks. (a) The probability vector $\ket{P_{{\bm \theta}}}$ as an MPS. Each vertex is a rank-3 tensor for lattice $j$. The grey edges connecting neighbouring vertices represent a contraction over the virtual dimension of size $D$. The open black edges represent the physical dimensions. (b) The probability vector with a positivity ansatz, $\ket{P_{\bm \theta}} = \ket{\psi^{*}_{\bm \theta}} \odot \ket{\psi_{\bm \theta}}$. The physical dimensions of the two MPS are contracted with a three-point delta function.
Figure 2: The noising protocol as a TN. The distribution $\ket{P^{\bm \theta}_t} = {\mathcal{W}}_{t \leftarrow 0} \ket{P_{\bm \theta}}$ can be efficiently described by a TN. The blue and black spheres are the tensors for the probability $\ket{P_{{\bm \theta}}}$, see Fig. \ref{['fig: mps']}(b). The orange spheres are the tensors for the evolution operator ${\mathcal{W}}_{t \leftarrow 0}$, see Eq. \ref{['eq:noising_protocol']}.
Figure 3: Noising and denoising protocols. The noising protocol is a continuous-time Markov dynamics, ${\mathcal{W}}_{t \leftarrow 0}$, which progressively evolves a distribution onto the uniform distribution, $\ket{-}$. The denoising protocol is a time-inhomogeneous Markov dynamics, $\hat{{\mathcal{W}}}_{\hat{t} \leftarrow 0}$, which reverses the noising process.
Figure 4: Denoising protocol as a TN. (a) Graphical representation of Eq. \ref{['eq:Q_tau_cond']}: the state $\ket{\hat{P}^{\bm \theta}_{\hat{t} | \hat{\bm{\nu}}}}$ is an MPS obtained from propagating the initial $\ket{P_{\bm \theta}} = \ket{\psi^{*}} \odot \ket{\psi}$, where the blue spheres represent $\psi$, with the noising evolution operator ${\mathcal{W}}_{T-\hat{t}\leftarrow 0}$, represented by the red spheres, to obtain $\ket{P^{{\bm \theta}}_{T-\hat{t}}}$. The small grey spheres indicate the initial state $\hat{{\bm \nu}} = (\hat{\nu}_{1} ,\dots, \hat{\nu}_{N})$ for the denoising, which is acted upon by ${\mathcal{W}}^{\rm T}_{0 \to \hat{t}}$ (orange spheres, and where the black circles indicate delta tensors), and multiplied element-wise to $\ket{P^{{\bm \theta}}_{T-\hat{t}}}$. Rescaling by the overall factor $1/\braket{\hat{\bm{\nu}}|P^{\bm \theta}_{T}}$ (not shown) gives $\ket{\hat{P}^{\bm \theta}_{\hat{t} | \hat{\bm{\nu}}}}$. (b) Graphical representation when $\hat{t}=T$, see Eq. \ref{['eq:Qt_update']}: in this case there is no propagation of $\ket{P_{\bm \theta}}$.
Figure 5: MCMC updates with DDM generated proposals. (a) Disconnected update: We sample $P({\bm \sigma})$ by sampling the joint $P_{0,T}({\bm \sigma},{\bm \nu})$ and contracting as the acceptance probability Eq. \ref{['eq:Metropolis2']} can be computed efficiently with TNs while the naive Eq. \ref{['eq:Metropolis']} cannot. A proposed new pair $(\hat{\bm \sigma},\hat{\bm \nu})$ is obtained by drawing $\hat{\bm \nu}$ from the flat distribution and applying the denoising protocol for time $T$ to generate $\hat{\bm \sigma}$. The new pair is accepted with probability Eq. \ref{['eq:acceptance_correlated']}. The generated $\hat{\bm \sigma}$ are always uncorrelated from the current state ${\bm \sigma}$ since the starting configuration $\hat{\bm \nu}$ of the denoising step is completely independent of final configuration ${\bm \nu}$ of the noising step. --- (b) Connected update: We sample $P({\bm \sigma})$ directly. Starting from the current configuration ${\bm \sigma}$ we denoise it for time $T$, producing a corrupted configuration ${\bm \nu}$. We then run denoising starting from ${\bm \nu}$ also for time $T$ to generate $\hat{\bm \sigma}$. This proposal is accepted with probability \ref{['eq:acceptance_correlated']} which can be efficiently computed with TNs. The proposed $\hat{\bm \sigma}$ is correlated with the current ${\bm \sigma}$ through ${\bm \nu}$. The degree of correlation is controlled by $T$, with shorter $T$ corresponding to stronger correlation. (We show configurations sampled using an MPS for a 2D Ising model of size $N = 30 \times 30$ with open boundary conditions and at inverse temperature $\beta = 1.1\beta_{c}$, where $\beta_{c}$.)
...and 12 more figures

Discrete generative diffusion models without stochastic differential equations: a tensor network approach

TL;DR

Abstract

Discrete generative diffusion models without stochastic differential equations: a tensor network approach

Authors

TL;DR

Abstract

Table of Contents

Figures (17)