Sinkhorn-Drifting Generative Models

Ping He; Om Khangaonkar; Hamed Pirsiavash; Yikun Bai; Soheil Kolouri

Sinkhorn-Drifting Generative Models

Ping He, Om Khangaonkar, Hamed Pirsiavash, Yikun Bai, Soheil Kolouri

Abstract

We establish a theoretical link between the recently proposed "drifting" generative dynamics and gradient flows induced by the Sinkhorn divergence. In a particle discretization, the drift field admits a cross-minus-self decomposition: an attractive term toward the target distribution and a repulsive/self-correction term toward the current model, both expressed via one-sided normalized Gibbs kernels. We show that Sinkhorn divergence yields an analogous cross-minus-self structure, but with each term defined by entropic optimal-transport couplings obtained through two-sided Sinkhorn scaling (i.e., enforcing both marginals). This provides a precise sense in which drifting acts as a surrogate for a Sinkhorn-divergence gradient flow, interpolating between one-sided normalization and full two-sided Sinkhorn scaling. Crucially, this connection resolves an identifiability gap in prior drifting formulations: leveraging the definiteness of the Sinkhorn divergence, we show that zero drift (equilibrium of the dynamics) implies that the model and target measures match. Experiments show that Sinkhorn drifting reduces sensitivity to kernel temperature and improves one-step generative quality, trading off additional training time for a more stable optimization, without altering the inference procedure used by drift methods. These theoretical gains translate to strong low-temperature improvements in practice: on FFHQ-ALAE at the lowest temperature setting we evaluate, Sinkhorn drifting reduces mean FID from 187.7 to 37.1 and mean latent EMD from 453.3 to 144.4, while on MNIST it preserves full class coverage across the temperature sweep. Project page: https://mint-vu.github.io/SinkhornDrifting/

Sinkhorn-Drifting Generative Models

Abstract

Paper Structure (74 sections, 25 theorems, 204 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 74 sections, 25 theorems, 204 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
Background and Notations
Drift Method
Drift Velocity Field.
Drift generative model.
Entropic OT and Sinkhorn divergence
Sinkhorn algorithm
Wasserstein Gradient flows.
Wasserstein Gradient Flow of Sinkhorn-Divergence
Sinkhorn Divergence Flow Model
Relation to Drift Field Method
Training Loss and Algorithm.
Discussion of the identity
Zero Sinkhorn drift in a smooth-density setting.
Zero Empirical Sinkhorn Drift implies the Identity.
...and 59 more sections

Key Result

Proposition 3.1

Under the finite-sample approximation $\hat{p}_{\mathrm{data}}=\sum_{j=1}^n\frac{1}{n}\delta_{y^j}$ and $\hat{q}:=\hat{q}_X=\sum_{i=1}^n\frac{1}{n}\delta_{x^i}$, the above Wasserstein gradient flow becomes: and the Euler forward step eq:wgf_step becomes: where $\pi_{XY}^\infty=\pi^\infty_{q_X,p}$ and $\pi_{XX}^\infty$ is defined similarly.

Figures (10)

Figure 1: Drift trajectories across different values of $\tau$ under one-sided, two-sided, and Sinkhorn normalization, with and without self-distance masking. The Sinkhorn trajectories are generated with a fixed number of iterations, $30$.
Figure 2: Generative model training on 2D distributions across $\tau\in\{0.01,0.05,0.1\}$ and three normalization schemes (one-sided, two-sided, Sinkhorn) with Gaussian kernel. Left six columns: final generated samples (orange) vs. target (blue). Right two columns: $W_2^2$ convergence curves over $5{,}000$ iterations. Sinkhorn consistently achieves lower $W_2^2$ and better mode coverage, especially at small $\tau$.
Figure 3: Generated MNIST samples (Gaussian kernel). Each panel shows 10 classes $\times$ 8 samples. (a) Baseline at $\tau=0.01$ collapses to a single degenerate mode; class accuracy is ${\approx}10\%$ (random chance). (b) Sinkhorn at $\tau=0.01$ correctly generates all ten classes with 100% class accuracy. (c,d) At $\tau=0.1$ both methods produce recognizable digits, with Sinkhorn remaining sharper and more consistent.
Figure 4: Qualitative comparison of class-conditional FFHQ generation at $\tau{=}1.0$ (top) and $\tau{=}10.0$ (bottom). In each panel, each row corresponds to one class; Baseline is on the left and Sinkhorn is on the right. The corresponding low-temperature qualitative panel ($\tau{=}0.1$) is shown in Figure \ref{['fig:alae_appendix_tau01']} of Appendix \ref{['sec:appendix_alae']}.
Figure 5: Full drift-trajectory grid for the temperature sweep in Section \ref{['sec:drift_tau']}. Columns correspond to one-sided, two-sided, and Sinkhorn normalization, with and without self-distance masking where applicable; rows correspond to different values of $\tau$. Sinkhorn exhibits the most stable trajectories as $\tau$ decreases, whereas one-sided and two-sided normalization become increasingly sensitive in the low-temperature regime.
...and 5 more figures

Theorems & Definitions (59)

Proposition 3.1
Remark 3.1
Proposition 3.2
proof
Remark 3.2
Proposition 3.3
Remark 3.3
Proposition 3.4
Proposition 3.5
Remark 3.4
...and 49 more

Sinkhorn-Drifting Generative Models

Abstract

Sinkhorn-Drifting Generative Models

Authors

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (59)