Generative Modeling of Discrete Joint Distributions by E-Geodesic Flow Matching on Assignment Manifolds
Bastian Boll, Daniel Gonzalez-Alvarado, Christoph Schnörr
TL;DR
This work develops a geometry-aware generative framework for discrete distributions by operating continuous normalizing flows on the assignment manifold $\mathcal{W}$ and embedding it into the meta-simplex $\mathcal{S}_N$. Training relies on Riemannian flow matching of $e$-geodesics, yielding representations of general discrete joint distributions as convex mixtures of extremal factorizing distributions. A key innovation is the meta-simplex embedding via $T(W)_{\alpha} = \prod_{i} W_{i,\alpha_i}$, which connects the manifold of simple distributions to the full joint distribution space while preserving a maximum-entropy property. Empirical results on image segmentation and likelihood-based diagnostics demonstrate accurate sample generation, efficient training, and effective out-of-distribution detection, highlighting the method's potential as a scalable alternative for discrete data modeling with principled information-geometric grounding.
Abstract
This paper introduces a novel generative model for discrete distributions based on continuous normalizing flows on the submanifold of factorizing discrete measures. Integration of the flow gradually assigns categories and avoids issues of discretizing the latent continuous model like rounding, sample truncation etc. General non-factorizing discrete distributions capable of representing complex statistical dependencies of structured discrete data, can be approximated by embedding the submanifold into a the meta-simplex of all joint discrete distributions and data-driven averaging. Efficient training of the generative model is demonstrated by matching the flow of geodesics of factorizing discrete distributions. Various experiments underline the approach's broad applicability.
