Table of Contents
Fetching ...

SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups

Yongxing Zhang, Donglin Yang, Renjie Liao

TL;DR

This work tackles learning probability distributions over the factorially large finite symmetric group $S_n$ by introducing SymmetricDiffusers, a discrete diffusion framework that decomposes learning into simpler reverse transitions. The forward process uses riffle shuffles guided by mixing-time theory, while the reverse process employs a Generalized Plackett-Luce (GPL) distribution with a denoising schedule to improve sampling efficiency. Key contributions include proving GPL's expressive power over $S_n$ (unlike PL), providing principled diffusion-length guidance, and demonstrating strong empirical performance on sorting 4-digit MNIST, jigsaw puzzles, and TSP, with public code. The approach enables scalable permutation modeling for ranking, combinatorial tasks, and related optimization problems, while acknowledging $O(n^2)$ complexity and outlining future extensions to broader finite groups and Lie groups.

Abstract

Finite symmetric groups $S_n$ are essential in fields such as combinatorics, physics, and chemistry. However, learning a probability distribution over $S_n$ poses significant challenges due to its intractable size and discrete nature. In this paper, we introduce SymmetricDiffusers, a novel discrete diffusion model that simplifies the task of learning a complicated distribution over $S_n$ by decomposing it into learning simpler transitions of the reverse diffusion using deep neural networks. We identify the riffle shuffle as an effective forward transition and provide empirical guidelines for selecting the diffusion length based on the theory of random walks on finite groups. Additionally, we propose a generalized Plackett-Luce (PL) distribution for the reverse transition, which is provably more expressive than the PL distribution. We further introduce a theoretically grounded "denoising schedule" to improve sampling and learning efficiency. Extensive experiments show that our model achieves state-of-the-art or comparable performances on solving tasks including sorting 4-digit MNIST images, jigsaw puzzles, and traveling salesman problems. Our code is released at https://github.com/DSL-Lab/SymmetricDiffusers.

SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups

TL;DR

This work tackles learning probability distributions over the factorially large finite symmetric group by introducing SymmetricDiffusers, a discrete diffusion framework that decomposes learning into simpler reverse transitions. The forward process uses riffle shuffles guided by mixing-time theory, while the reverse process employs a Generalized Plackett-Luce (GPL) distribution with a denoising schedule to improve sampling efficiency. Key contributions include proving GPL's expressive power over (unlike PL), providing principled diffusion-length guidance, and demonstrating strong empirical performance on sorting 4-digit MNIST, jigsaw puzzles, and TSP, with public code. The approach enables scalable permutation modeling for ranking, combinatorial tasks, and related optimization problems, while acknowledging complexity and outlining future extensions to broader finite groups and Lie groups.

Abstract

Finite symmetric groups are essential in fields such as combinatorics, physics, and chemistry. However, learning a probability distribution over poses significant challenges due to its intractable size and discrete nature. In this paper, we introduce SymmetricDiffusers, a novel discrete diffusion model that simplifies the task of learning a complicated distribution over by decomposing it into learning simpler transitions of the reverse diffusion using deep neural networks. We identify the riffle shuffle as an effective forward transition and provide empirical guidelines for selecting the diffusion length based on the theory of random walks on finite groups. Additionally, we propose a generalized Plackett-Luce (PL) distribution for the reverse transition, which is provably more expressive than the PL distribution. We further introduce a theoretically grounded "denoising schedule" to improve sampling and learning efficiency. Extensive experiments show that our model achieves state-of-the-art or comparable performances on solving tasks including sorting 4-digit MNIST images, jigsaw puzzles, and traveling salesman problems. Our code is released at https://github.com/DSL-Lab/SymmetricDiffusers.
Paper Structure (51 sections, 9 theorems, 35 equations, 3 figures, 14 tables)

This paper contains 51 sections, 9 theorems, 35 equations, 3 figures, 14 tables.

Key Result

Proposition 0

The PL distribution cannot represent a delta distribution over $S_n$.

Figures (3)

  • Figure 1: This figure illustrates our discrete diffusion model on finite symmetric groups. The middle graphical model displays the forward and reverse diffusion processes. We demonstrate learning distributions over the symmetric group $S_3$ via the task of sorting three MNIST 4-digit images. The top part of the figure shows the marginal distribution of a ranked list of images $X_t$ at time $t$, while the bottom shows a randomly drawn list of images.
  • Figure 2: (a) $D_{\mathrm{TV}}(q^{(t)}_{\mathrm{RS}}, u)$ computed using Eq. \ref{['tvbetweenu']}. We choose $T=15$ (red dot) based on the threshold $0.005$. (b) A heatmap for $D_{\mathrm{TV}}(q^{(t)}_{\mathrm{RS}}, q^{(t^{\prime})}_{\mathrm{RS}})$ for $n=100$ and $1\leq t<t'\leq 15$, computed using Eq. \ref{['tvbetweenrs']}. Rows are $t$ and columns are $t^{\prime}$. We choose the denoising schedule $[0,8,10,15]$.
  • Figure 3: A simple example for the GPL expressiveness theorem on $S_3$.

Theorems & Definitions (14)

  • Proposition 0
  • Theorem 0
  • Proposition 0
  • Proposition 0
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 3
  • ...and 4 more