Table of Contents
Fetching ...

Generative Assignment Flows for Representing and Learning Joint Distributions of Discrete Data

Bastian Boll, Daniel Gonzalez-Alvarado, Stefania Petra, Christoph Schnörr

TL;DR

This work tackles the challenge of modeling joint distributions over large discrete alphabets by embedding factorizing distributions onto a metrical manifold and transporting a simple reference measure through randomized assignment flows. The core idea is to represent any discrete distribution as an expectation of factorizing components via the embedding $T(W)$ and a learned flow on the assignment manifold, trained with simulation-free Riemannian flow matching on geodesics. Key contributions include the introduction of an infinitetime transport to mitigate boundary issues, a principled way to couple many discrete variables through the submanifold geometry, and empirical demonstrations on class scaling and image-segmentation tasks that show improved scalability and interpolation capability. The framework yields efficient sampling and likelihood evaluation for discrete data while leveraging information-geometric structure, making it suitable for structured prediction in large-class settings.

Abstract

We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables. The approach uses measure transport by randomized assignment flows on the statistical submanifold of factorizing distributions, which enables to represent and sample efficiently from any target distribution and to assess the likelihood of unseen data points. The complexity of the target distribution only depends on the parametrization of the affinity function of the dynamical assignment flow system. Our model can be trained in a simulation-free manner by conditional Riemannian flow matching, using the training data encoded as geodesics on the assignment manifold in closed-form, with respect to the e-connection of information geometry. Numerical experiments devoted to distributions of structured image labelings demonstrate the applicability to large-scale problems, which may include discrete distributions in other application areas. Performance measures show that our approach scales better with the increasing number of classes than recent related work.

Generative Assignment Flows for Representing and Learning Joint Distributions of Discrete Data

TL;DR

This work tackles the challenge of modeling joint distributions over large discrete alphabets by embedding factorizing distributions onto a metrical manifold and transporting a simple reference measure through randomized assignment flows. The core idea is to represent any discrete distribution as an expectation of factorizing components via the embedding and a learned flow on the assignment manifold, trained with simulation-free Riemannian flow matching on geodesics. Key contributions include the introduction of an infinitetime transport to mitigate boundary issues, a principled way to couple many discrete variables through the submanifold geometry, and empirical demonstrations on class scaling and image-segmentation tasks that show improved scalability and interpolation capability. The framework yields efficient sampling and likelihood evaluation for discrete data while leveraging information-geometric structure, making it suitable for structured prediction in large-class settings.

Abstract

We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables. The approach uses measure transport by randomized assignment flows on the statistical submanifold of factorizing distributions, which enables to represent and sample efficiently from any target distribution and to assess the likelihood of unseen data points. The complexity of the target distribution only depends on the parametrization of the affinity function of the dynamical assignment flow system. Our model can be trained in a simulation-free manner by conditional Riemannian flow matching, using the training data encoded as geodesics on the assignment manifold in closed-form, with respect to the e-connection of information geometry. Numerical experiments devoted to distributions of structured image labelings demonstrate the applicability to large-scale problems, which may include discrete distributions in other application areas. Performance measures show that our approach scales better with the increasing number of classes than recent related work.
Paper Structure (42 sections, 6 theorems, 108 equations, 10 figures, 1 table)

This paper contains 42 sections, 6 theorems, 108 equations, 10 figures, 1 table.

Key Result

Proposition 2.2

For every $W\in\mathcal{W}_{c}$, the distribution $T(W)\in\mathcal{S}_{N}$ has maximum entropy among all $p\in\mathcal{S}_{N}$ subject to the marginal constraint

Figures (10)

  • Figure 1.1: (a) The simplex $\Delta_{N}$\ref{['def:Delta-N']}, for $N=4$, depicted in local coordinates, and the submanifold of factorizing discrete distributions which connects all extreme points of $\Delta_{4}$. (b) Visualization of 1000 samples from the target distribution $p(\alpha_{1},\alpha_{2})$ given by \ref{['eq:py1y2']}, corresponding to the blue point $p\in\Delta_{4}$. Each sample corresponds to an integral curve of a flow which evolves on the submanifold and can be computed efficiently by geometric integration. The parametrized vector field of the dynamical system which generates the flow has been trainined by matching the flow to geodesics on the submanifold which encode given training data. As a result, each component $p_{\alpha}$ of the target distribution corresponds to the relative frequency of integral curves converging to the vertex $e_{\alpha}$, such that the entire distribution $p$ is represented by the convex combination $\sum_{\alpha} p_{\alpha} e_{\alpha} = p$. In this way, the flow realizes the pushforward of a simple reference distribution, centered at $0$ in the tangent space at the barycenter (red point), to the discrete target distribution $p$. Figure \ref{['fig:approach']} (p. \ref{['fig:approach']}) provides a more detailled illustration of the approach.
  • Figure 3.1: Overview of the approach: The standard Gaussian reference measure $\mathcal{N}(0,I)$ is pushed forward by the lifting map $\exp_{W}$ from the flat tangent product space $\mathcal{T}_{0}$ to the assignment manifold $\mathcal{W}_{c}$, and further to the meta-simplex $\mathcal{S}_{N}$ via the embedding map $T$\ref{['eq:def-T-embedding']}, by geometrically integrating the assignment flow equation \ref{['eq:AF-general']}. Since the assignment flow converges to the extreme points of $\overline{\mathcal{W}_{c}}$ which after embedding agree with the extreme points of $\Delta_{N}=\overline{\mathcal{S}_{N}}$, an approximation $\widetilde{p}(\alpha)$ of a general discrete target measure $p(\alpha)$ can be learned in terms of a corresponding convex combination of extreme points. This is achieved by matching the flow of e-geodesics which encode given training samples to the generating assignment flow, by empirical expectation, and by learning the parameters of the affinity function $F_{\theta}$\ref{['eq:def-affinity-function']}. Since factorizing distributions $T(W),\, W\in\mathcal{W}_{c}$, are only required, the approach is computationally feasible also in high dimensions.
  • Figure 3.2: Influence of the parameter $\lambda$ controlling in \ref{['eq:tangent_gaussian_path']} and \ref{['eq:condvectorfield-model']}, respectively, the rate of assignment of mass of the pushforward probability measure \ref{['eq:nu_cond_lifted_gauss']} to a target label, depending on the number $c$ of labels (classes, categories).
  • Figure 3.3: Norms $\|v(s)\|$ of the tangent vectors $v(s) = \exp_{\mathbb{1}_{\mathcal{S}}}^{-1}(p(s))$ with $p(s)=(\frac{s-1}{s},\frac{1}{(c-1) s},\dotsc,\frac{1}{(c-1) s}) \to e_{1}\in\mathbb{R}^{c}$ if $s\to\infty$, for numbers of labels $c\in\{3,10,100,1000\}$. Since $\|e_{1}-p(s)\|=(\frac{c}{c-1})^{1/2}\frac{1}{s}\approx \frac{1}{s}$, the simplex $\Delta_{c}$ is covered, up to a very small distance to its boundary, by $\exp_{\mathbb{1}_{\mathcal{S}}}(B_{0}(r)) \subset \mathcal{S}_{c}$ and tangent vectors $v\in B_{0}(r)\subset T_{0}$ within a ball $B_{0}(r)$ centered at $0\in T_{0}$ with radius $r = 15$.
  • Figure 4.1: Relative entropy between learned models (histogram of 512k samples) and a known, factorizing target distribution on $n=4$ simplices with varying number of classes $c$. By leveraging information geometry and gradual decision-making over time, our proposed approach (red) is able to outperform our earlier method Boll:2024ab as well as Dirichlet flow matching Stark:2024aa in terms of scaling to many classes $c$.
  • ...and 5 more figures

Theorems & Definitions (14)

  • Example 2.1
  • Proposition 2.2: Boll:2024aa
  • Lemma 3.1: convex combination of embedded nodewise measures
  • proof
  • Proposition 3.2: conditional vector fields
  • proof
  • Proposition 3.3: conditional path constraints
  • proof
  • Lemma 3.4: orthogonal projection onto $\mathop{\mathrm{img}}\nolimits(Q)\cap \mathcal{T}_0\mathcal{S}_N$
  • Theorem 3.5: projected flow matching on $\mathcal{S}_N$
  • ...and 4 more