Table of Contents
Fetching ...

Sinkhorn-Drifting Generative Models

Ping He, Om Khangaonkar, Hamed Pirsiavash, Yikun Bai, Soheil Kolouri

Abstract

We establish a theoretical link between the recently proposed "drifting" generative dynamics and gradient flows induced by the Sinkhorn divergence. In a particle discretization, the drift field admits a cross-minus-self decomposition: an attractive term toward the target distribution and a repulsive/self-correction term toward the current model, both expressed via one-sided normalized Gibbs kernels. We show that Sinkhorn divergence yields an analogous cross-minus-self structure, but with each term defined by entropic optimal-transport couplings obtained through two-sided Sinkhorn scaling (i.e., enforcing both marginals). This provides a precise sense in which drifting acts as a surrogate for a Sinkhorn-divergence gradient flow, interpolating between one-sided normalization and full two-sided Sinkhorn scaling. Crucially, this connection resolves an identifiability gap in prior drifting formulations: leveraging the definiteness of the Sinkhorn divergence, we show that zero drift (equilibrium of the dynamics) implies that the model and target measures match. Experiments show that Sinkhorn drifting reduces sensitivity to kernel temperature and improves one-step generative quality, trading off additional training time for a more stable optimization, without altering the inference procedure used by drift methods. These theoretical gains translate to strong low-temperature improvements in practice: on FFHQ-ALAE at the lowest temperature setting we evaluate, Sinkhorn drifting reduces mean FID from 187.7 to 37.1 and mean latent EMD from 453.3 to 144.4, while on MNIST it preserves full class coverage across the temperature sweep. Project page: https://mint-vu.github.io/SinkhornDrifting/

Sinkhorn-Drifting Generative Models

Abstract

We establish a theoretical link between the recently proposed "drifting" generative dynamics and gradient flows induced by the Sinkhorn divergence. In a particle discretization, the drift field admits a cross-minus-self decomposition: an attractive term toward the target distribution and a repulsive/self-correction term toward the current model, both expressed via one-sided normalized Gibbs kernels. We show that Sinkhorn divergence yields an analogous cross-minus-self structure, but with each term defined by entropic optimal-transport couplings obtained through two-sided Sinkhorn scaling (i.e., enforcing both marginals). This provides a precise sense in which drifting acts as a surrogate for a Sinkhorn-divergence gradient flow, interpolating between one-sided normalization and full two-sided Sinkhorn scaling. Crucially, this connection resolves an identifiability gap in prior drifting formulations: leveraging the definiteness of the Sinkhorn divergence, we show that zero drift (equilibrium of the dynamics) implies that the model and target measures match. Experiments show that Sinkhorn drifting reduces sensitivity to kernel temperature and improves one-step generative quality, trading off additional training time for a more stable optimization, without altering the inference procedure used by drift methods. These theoretical gains translate to strong low-temperature improvements in practice: on FFHQ-ALAE at the lowest temperature setting we evaluate, Sinkhorn drifting reduces mean FID from 187.7 to 37.1 and mean latent EMD from 453.3 to 144.4, while on MNIST it preserves full class coverage across the temperature sweep. Project page: https://mint-vu.github.io/SinkhornDrifting/
Paper Structure (74 sections, 25 theorems, 204 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 74 sections, 25 theorems, 204 equations, 10 figures, 3 tables, 1 algorithm.

Key Result

Proposition 3.1

Under the finite-sample approximation $\hat{p}_{\mathrm{data}}=\sum_{j=1}^n\frac{1}{n}\delta_{y^j}$ and $\hat{q}:=\hat{q}_X=\sum_{i=1}^n\frac{1}{n}\delta_{x^i}$, the above Wasserstein gradient flow becomes: and the Euler forward step eq:wgf_step becomes: where $\pi_{XY}^\infty=\pi^\infty_{q_X,p}$ and $\pi_{XX}^\infty$ is defined similarly.

Figures (10)

  • Figure 1: Drift trajectories across different values of $\tau$ under one-sided, two-sided, and Sinkhorn normalization, with and without self-distance masking. The Sinkhorn trajectories are generated with a fixed number of iterations, $30$.
  • Figure 2: Generative model training on 2D distributions across $\tau\in\{0.01,0.05,0.1\}$ and three normalization schemes (one-sided, two-sided, Sinkhorn) with Gaussian kernel. Left six columns: final generated samples (orange) vs. target (blue). Right two columns: $W_2^2$ convergence curves over $5{,}000$ iterations. Sinkhorn consistently achieves lower $W_2^2$ and better mode coverage, especially at small $\tau$.
  • Figure 3: Generated MNIST samples (Gaussian kernel). Each panel shows 10 classes $\times$ 8 samples. (a) Baseline at $\tau=0.01$ collapses to a single degenerate mode; class accuracy is ${\approx}10\%$ (random chance). (b) Sinkhorn at $\tau=0.01$ correctly generates all ten classes with 100% class accuracy. (c,d) At $\tau=0.1$ both methods produce recognizable digits, with Sinkhorn remaining sharper and more consistent.
  • Figure 4: Qualitative comparison of class-conditional FFHQ generation at $\tau{=}1.0$ (top) and $\tau{=}10.0$ (bottom). In each panel, each row corresponds to one class; Baseline is on the left and Sinkhorn is on the right. The corresponding low-temperature qualitative panel ($\tau{=}0.1$) is shown in Figure \ref{['fig:alae_appendix_tau01']} of Appendix \ref{['sec:appendix_alae']}.
  • Figure 5: Full drift-trajectory grid for the temperature sweep in Section \ref{['sec:drift_tau']}. Columns correspond to one-sided, two-sided, and Sinkhorn normalization, with and without self-distance masking where applicable; rows correspond to different values of $\tau$. Sinkhorn exhibits the most stable trajectories as $\tau$ decreases, whereas one-sided and two-sided normalization become increasingly sensitive in the low-temperature regime.
  • ...and 5 more figures

Theorems & Definitions (59)

  • Proposition 3.1
  • Remark 3.1
  • Proposition 3.2
  • proof
  • Remark 3.2
  • Proposition 3.3
  • Remark 3.3
  • Proposition 3.4
  • Proposition 3.5
  • Remark 3.4
  • ...and 49 more