Table of Contents
Fetching ...

Unrestrained Simplex Denoising for Discrete Data. A Non-Markovian Approach Applied to Graph Generation

Yoann Boget, Alexandros Kalousis

Abstract

Denoising models such as Diffusion or Flow Matching have recently advanced generative modeling for discrete structures, yet most approaches either operate directly in the discrete state space, causing abrupt state changes. We introduce simplex denoising, a simple yet effective generative framework that operates on the probability simplex. The key idea is a non-Markovian noising scheme in which, for a given clean data point, noisy representations at different times are conditionally independent. While preserving the theoretical guarantees of denoising-based generative models, our method removes unnecessary constraints, thereby improving performance and simplifying the formulation. Empirically, \emph{unrestrained simplex denoising} surpasses strong discrete diffusion and flow-matching baselines across synthetic and real-world graph benchmarks. These results highlight the probability simplex as an effective framework for discrete generative modeling.

Unrestrained Simplex Denoising for Discrete Data. A Non-Markovian Approach Applied to Graph Generation

Abstract

Denoising models such as Diffusion or Flow Matching have recently advanced generative modeling for discrete structures, yet most approaches either operate directly in the discrete state space, causing abrupt state changes. We introduce simplex denoising, a simple yet effective generative framework that operates on the probability simplex. The key idea is a non-Markovian noising scheme in which, for a given clean data point, noisy representations at different times are conditionally independent. While preserving the theoretical guarantees of denoising-based generative models, our method removes unnecessary constraints, thereby improving performance and simplifying the formulation. Empirically, \emph{unrestrained simplex denoising} surpasses strong discrete diffusion and flow-matching baselines across synthetic and real-world graph benchmarks. These results highlight the probability simplex as an effective framework for discrete generative modeling.

Paper Structure

This paper contains 87 sections, 5 theorems, 58 equations, 9 figures, 15 tables.

Key Result

Proposition 3.1

Assume ${\mathbf{x}} \sim \text{Dir}({\bm{1}} + \alpha_t{\bm{e}}_i)$. Then,

Figures (9)

  • Figure 1: Upper row: Noising with linear interpolant creates discontinuities and, for $t > 0.5$, all points remain in their Voronoi region. Lower row: Noising with explicit parametric probability path avoids discontinuities.
  • Figure 2: Evolution of metrics on ZINC250K as a function of the number of function evaluations (NFE).
  • Figure 3: Voronoi probabilities over time for ${\mathbf{x}}_t \in {\mathbb{S}}_3 \sim \text{Dir}({\bm{1}} + \alpha_t{\mathbf{x}}_1)$, with $\alpha_t = -a\log(1-t)$ for various values of $a$. For $a=1$, $P_{v_k}$ increases rapidly as $t \to 1$; for $a=10$, $P_{v_k}$ is already close to $1$ by $t=0.6$. Suitable choices of $a$ therefore likely lie between $1$ and $10$.
  • Figure 4: Voronoi probabilities for various values $a$ of the scheduler $\alpha_t = -a\log(1-t)$ with $K=9$ (left) and $K=2$ (right).
  • Figure 5: Comparison between uniform prior and marginal weighted mixture of Dirichlet.
  • ...and 4 more figures

Theorems & Definitions (5)

  • Proposition 3.1
  • Proposition 4.2
  • Proposition 4.3
  • Proposition 4.4
  • Proposition 1.1: Convergence to the stationary distribution