Table of Contents
Fetching ...

Discrete Diffusion with Sample-Efficient Estimators for Conditionals

Karthik Elamvazhuthi, Abhijith Jayakumar, Andrey Y. Lokhov

Abstract

We study a discrete denoising diffusion framework that integrates a sample-efficient estimator of single-site conditionals with round-robin noising and denoising dynamics for generative modeling over discrete state spaces. Rather than approximating a discrete analog of a score function, our formulation treats single-site conditional probabilities as the fundamental objects that parameterize the reverse diffusion process. We employ a sample-efficient method known as Neural Interaction Screening Estimator (NeurISE) to estimate these conditionals in the diffusion dynamics. Controlled experiments on synthetic Ising models, MNIST, and scientific data sets produced by a D-Wave quantum annealer, synthetic Potts model and one-dimensional quantum systems demonstrate the proposed approach. On the binary data sets, these experiments demonstrate that the proposed approach outperforms popular existing methods including ratio-based approaches, achieving improved performance in total variation, cross-correlations, and kernel density estimation metrics.

Discrete Diffusion with Sample-Efficient Estimators for Conditionals

Abstract

We study a discrete denoising diffusion framework that integrates a sample-efficient estimator of single-site conditionals with round-robin noising and denoising dynamics for generative modeling over discrete state spaces. Rather than approximating a discrete analog of a score function, our formulation treats single-site conditional probabilities as the fundamental objects that parameterize the reverse diffusion process. We employ a sample-efficient method known as Neural Interaction Screening Estimator (NeurISE) to estimate these conditionals in the diffusion dynamics. Controlled experiments on synthetic Ising models, MNIST, and scientific data sets produced by a D-Wave quantum annealer, synthetic Potts model and one-dimensional quantum systems demonstrate the proposed approach. On the binary data sets, these experiments demonstrate that the proposed approach outperforms popular existing methods including ratio-based approaches, achieving improved performance in total variation, cross-correlations, and kernel density estimation metrics.
Paper Structure (21 sections, 4 theorems, 71 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 4 theorems, 71 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $\{X_n\}_{n=0}^T$ be the Markov chain on $\Sigma^q$ with forward transition kernels $k_n:\Sigma^q\times\Sigma^q\to\mathbb{R}_{\ge 0}$. Fix a noise reference distribution $\mu_{\mathrm{noise}}$ on $\Sigma^q$ and assume that for some $\delta_T\in[0,1]$, Let $\{k_n^{\mathrm{rev}}\}_{n=0}^{T-1}$ be a well-defined family of reverse kernels that satisfy eq:revp. Consider approximate reverse kernels

Figures (6)

  • Figure 1: Trend of TV and Cross-correlation error as a function of training set size for Ising models, averaged over $5$ Ising models and $10$ trials per data set. The test sample size was taken to be $10^5$ for each experiment. Error bars represent one standard deviation over trials.
  • Figure 2: Class-conditional MNIST samples. Each subfigure shows generated samples arranged with one row per digit (0–9).
  • Figure 3: Trend of TV for a non-binary Potts model as a function of training set size, averaged over $10$ trials for each size. The test sample size was taken to be $10^5$ for each experiment. Error bars represent one standard deviation over trials.
  • Figure 4: Trend of cross-correlation error of NeuRISE Diffusion trained to learn the GHZ state as a function of training set size, averaged over $10$ trials for each size. The test sample size was taken to be $10^5$ for each experiment. Error bars represent one standard deviation over trials.
  • Figure 5: Trend of TV and Training time for Ising models, averaged $10$ trials per data set. The test sample size was taken to be $10^4$ for each experiment. The harsh noise version of the problem performs competitively with situations where noise is soft.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Theorem 3.1
  • Corollary 3.1
  • Theorem 3.1
  • proof
  • Corollary 3.0
  • proof