Table of Contents
Fetching ...

Learning Permutation Distributions via Reflected Diffusion on Ranks

Sizhuang He, Yangtian Zhang, Shiyang Zhang, David van Dijk

Abstract

The finite symmetric group S_n provides a natural domain for permutations, yet learning probability distributions on S_n is challenging due to its factorially growing size and discrete, non-Euclidean structure. Recent permutation diffusion methods define forward noising via shuffle-based random walks (e.g., riffle shuffles) and learn reverse transitions with Plackett-Luce (PL) variants, but the resulting trajectories can be abrupt and increasingly hard to denoise as n grows. We propose Soft-Rank Diffusion, a discrete diffusion framework that replaces shuffle-based corruption with a structured soft-rank forward process: we lift permutations to a continuous latent representation of order by relaxing discrete ranks into soft ranks, yielding smoother and more tractable trajectories. For the reverse process, we introduce contextualized generalized Plackett-Luce (cGPL) denoisers that generalize prior PL-style parameterizations and improve expressivity for sequential decision structures. Experiments on sorting and combinatorial optimization benchmarks show that Soft-Rank Diffusion consistently outperforms prior diffusion baselines, with particularly strong gains in long-sequence and intrinsically sequential settings.

Learning Permutation Distributions via Reflected Diffusion on Ranks

Abstract

The finite symmetric group S_n provides a natural domain for permutations, yet learning probability distributions on S_n is challenging due to its factorially growing size and discrete, non-Euclidean structure. Recent permutation diffusion methods define forward noising via shuffle-based random walks (e.g., riffle shuffles) and learn reverse transitions with Plackett-Luce (PL) variants, but the resulting trajectories can be abrupt and increasingly hard to denoise as n grows. We propose Soft-Rank Diffusion, a discrete diffusion framework that replaces shuffle-based corruption with a structured soft-rank forward process: we lift permutations to a continuous latent representation of order by relaxing discrete ranks into soft ranks, yielding smoother and more tractable trajectories. For the reverse process, we introduce contextualized generalized Plackett-Luce (cGPL) denoisers that generalize prior PL-style parameterizations and improve expressivity for sequential decision structures. Experiments on sorting and combinatorial optimization benchmarks show that Soft-Rank Diffusion consistently outperforms prior diffusion baselines, with particularly strong gains in long-sequence and intrinsically sequential settings.
Paper Structure (35 sections, 32 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 35 sections, 32 equations, 4 figures, 4 tables, 2 algorithms.

Figures (4)

  • Figure 1: Soft-Rank Diffusion. We define the forward diffusion by relaxing each item's rank to a continuous soft-rank variable and evolving these soft ranks in a reflected diffusion process lou2023reflecteddiffusionmodels. At each time $t$, the soft ranks induce a discrete ordering by simple sorting, thereby yielding a forward process in permutation space. For reverse sampling, we couple a discrete denoiser in permutation space (predicting a clean permutation from $X_t$) with an auxiliary continuous update: we lift the predicted permutation to a grid-aligned soft-rank vector $\hat{Z}_0$, sample an intermediate latent $Z_s$ for $s<t$ from a conditional reverse kernel $p(Z_s \mid Z_t, \hat{Z}_0, Z_1)$, and map back to permutation space by sorting $Z_s$, thus stepping backward in time.
  • Figure 2: Sampling in PL/GPL/cGPL.(a) PL and GPL. In PL, each item is assigned a single scalar score; sampling a permutation amounts to repeatedly sampling from the same score vector without replacement, masking selected items and renormalizing at each step. In GPL, each item is assigned a length-$N$ score vector, yielding position-specific logits: at step $i$ we sample according to the $i$-th column of the score matrix (after masking previously selected items), proceeding sequentially from $i=1$ to $N$. (b) cGPL. In cGPL, each item is assigned a position-dependent score vector dynamically. Sampling proceeds autoregressively: the score vector at later positions depends on the outcomes sampled at preceding positions. As in GPL, we apply masking and subsequent renormalization to obtain a valid probability distribution over the remaining items.
  • Figure 3: Exact-match accuracy and relative improvement versus SymmetricDiffusers as a function of sequence length (N). Panel \ref{['fig:mnist-accuracy']} shows accuracy on a logit scale. Panel \ref{['fig:mnist-gain']} reports the accuracy ratio of Soft-Rank Diffusion relative to SymmetricDiffusers on a log scale. Starred points mark sequence lengths where SymmetricDiffusers attains zero accuracy; the corresponding ratios are undefined and are clipped to the top of the plotted range for visualization.
  • Figure 4: Model architecture and Pointer-cGPL parameterization for permutation generation.We adopt a standard encoder--decoder Transformer backbone vaswani2023attentionneed. In the vanilla cGPL parameterization, the decoder states are mapped to logits via a linear output head (yielding a distribution over candidates at each step). In contrast, Panel \ref{['fig:subfig_ptr']} illustrates Pointer-cGPL, where a bi-affine compatibility module scores each encoded item representation against the current decoder state, producing step-wise logits over the input items that can be interpreted as a pointer distribution.