Table of Contents
Fetching ...

Glauber Generative Model: Discrete Diffusion Models via Binary Classification

Harshit Varma, Dheeraj Nagaraj, Karthikeyan Shanmugam

TL;DR

The paper tackles discrete generative modeling by introducing Glauber Generative Model (GGM), which uses time-dependent Glauber dynamics to denoise discrete token sequences.A key idea is reducing denoising to a sequence of binary classification tasks, enabling an exact reverse process via a transformer-based model and achieving linear scaling in the vocabulary size.Empirically, GGM delivers strong language-generation results relative to prior discrete diffusion models and demonstrates competitive image generation without dataset-specific tokenizers, with robust zero-shot infilling capabilities.While not yet surpassing state-of-the-art autoregressive LLMs or GAN-based image methods, the framework is principled, scalable, and shows potential for broad applications and extensions.

Abstract

We introduce the Glauber Generative Model (GGM), a new class of discrete diffusion models, to obtain new samples from a distribution given samples from a discrete space. GGM deploys a discrete Markov chain called the heat bath dynamics (or the Glauber dynamics) to denoise a sequence of noisy tokens to a sample from a joint distribution of discrete tokens. Our novel conceptual framework provides an exact reduction of the task of learning the denoising Markov chain to solving a class of binary classification tasks. More specifically, the model learns to classify a given token in a noisy sequence as signal or noise. In contrast, prior works on discrete diffusion models either solve regression problems to learn importance ratios, or minimize loss functions given by variational approximations. We apply GGM to language modeling and image generation, where images are discretized using image tokenizers like VQGANs. We show that it outperforms existing discrete diffusion models in language generation, and demonstrates strong performance for image generation without using dataset-specific image tokenizers. We also show that our model is capable of performing well in zero-shot control settings like text and image infilling.

Glauber Generative Model: Discrete Diffusion Models via Binary Classification

TL;DR

The paper tackles discrete generative modeling by introducing Glauber Generative Model (GGM), which uses time-dependent Glauber dynamics to denoise discrete token sequences.A key idea is reducing denoising to a sequence of binary classification tasks, enabling an exact reverse process via a transformer-based model and achieving linear scaling in the vocabulary size.Empirically, GGM delivers strong language-generation results relative to prior discrete diffusion models and demonstrates competitive image generation without dataset-specific tokenizers, with robust zero-shot infilling capabilities.While not yet surpassing state-of-the-art autoregressive LLMs or GAN-based image methods, the framework is principled, scalable, and shows potential for broad applications and extensions.

Abstract

We introduce the Glauber Generative Model (GGM), a new class of discrete diffusion models, to obtain new samples from a distribution given samples from a discrete space. GGM deploys a discrete Markov chain called the heat bath dynamics (or the Glauber dynamics) to denoise a sequence of noisy tokens to a sample from a joint distribution of discrete tokens. Our novel conceptual framework provides an exact reduction of the task of learning the denoising Markov chain to solving a class of binary classification tasks. More specifically, the model learns to classify a given token in a noisy sequence as signal or noise. In contrast, prior works on discrete diffusion models either solve regression problems to learn importance ratios, or minimize loss functions given by variational approximations. We apply GGM to language modeling and image generation, where images are discretized using image tokenizers like VQGANs. We show that it outperforms existing discrete diffusion models in language generation, and demonstrates strong performance for image generation without using dataset-specific image tokenizers. We also show that our model is capable of performing well in zero-shot control settings like text and image infilling.
Paper Structure (38 sections, 4 theorems, 16 equations, 8 figures, 2 tables, 4 algorithms)

This paper contains 38 sections, 4 theorems, 16 equations, 8 figures, 2 tables, 4 algorithms.

Key Result

Lemma 1

Suppose that $\Pi_t(\cdot|\mathcal{X}) = \Pi(\cdot|\mathcal{X})$ is the same for every $t$. Suppose $\Pi_t(\phi) \leq 1-\epsilon$ for some $\epsilon > 0$. As $T \to \infty$, the distribution of $X_T$ converges to $\Pi(\cdot|\mathcal{X})^{\otimes L}$ in total variation distance. Specifically, we have

Figures (8)

  • Figure 1: Example of Glauber dynamics in a discrete token space, where the tokens are characters.
  • Figure 2: Comparison of $256 \times 256$ conditional generations (middle) from our model on CelebA-HQ given masked inputs in the token space (top) with the ground-truth images (bottom).
  • Figure 3: Comparison of $256 \times 256$ conditional generations (middle) from our model on CelebA-HQ given masked inputs in the original pixel space (top) with the ground-truth images (bottom).
  • Figure 4: Nearest neighbors for $256 \times 256$ unconditional generations from GGM on CelebA-HQ. Row $1, 3$: unconditional generations from our model. Row $2$: nearest neighbors from the training data for row $1$ images. Row $4$: nearest neighbors from the training data for row $3$ images.
  • Figure 5: More $256\times 256$ unconditional generations from our model trained on the FFHQ dataset.
  • ...and 3 more figures

Theorems & Definitions (10)

  • Lemma 1
  • Lemma 2
  • proof
  • Remark 1
  • Lemma 3
  • Theorem 1
  • Remark 2
  • proof
  • proof
  • proof