Categorical Flow Matching on Statistical Manifolds

Chaoran Cheng; Jiahan Li; Jian Peng; Ge Liu

Categorical Flow Matching on Statistical Manifolds

Chaoran Cheng, Jiahan Li, Jian Peng, Ge Liu

TL;DR

This work introduces Statistical Flow Matching (SFM), a geometry-aware generative framework operating on the statistical manifold of probability measures, with a focus on the simplex of categorical distributions. By leveraging the Fisher information metric and geodesic flows, SFM provides tractable exact likelihoods, a stable training objective via a diffeomorphism to the sphere, and supports optimal transport within the training objective. The method is instantiated on categorical distributions, enabling closed-form exponential and logarithm maps and yielding a training and sampling procedure that respects intrinsic manifold geometry. Across toy and real-world discrete tasks, including vision, language, and bioinformatics, SFM demonstrates improved sampling quality and likelihood over existing discrete diffusion/flow methods and performs competitively with autoregressive baselines on character-level generation. The framework offers a principled link between Riemannian flow matching, information geometry, and natural gradient descent, with potential extensions to broader probability-measure targets and non-discrete domains.

Abstract

We introduce Statistical Flow Matching (SFM), a novel and mathematically rigorous flow-matching framework on the manifold of parameterized probability measures inspired by the results from information geometry. We demonstrate the effectiveness of our method on the discrete generation problem by instantiating SFM on the manifold of categorical distributions whose geometric properties remain unexplored in previous discrete generative models. Utilizing the Fisher information metric, we equip the manifold with a Riemannian structure whose intrinsic geometries are effectively leveraged by following the shortest paths of geodesics. We develop an efficient training and sampling algorithm that overcomes numerical stability issues with a diffeomorphism between manifolds. Our distinctive geometric perspective of statistical manifolds allows us to apply optimal transport during training and interpret SFM as following the steepest direction of the natural gradient. Unlike previous models that rely on variational bounds for likelihood estimation, SFM enjoys the exact likelihood calculation for arbitrary probability measures. We manifest that SFM can learn more complex patterns on the statistical manifold where existing models often fail due to strong prior assumptions. Comprehensive experiments on real-world generative tasks ranging from image, text to biological domains further demonstrate that SFM achieves higher sampling quality and likelihood than other discrete diffusion or flow-based models.

Categorical Flow Matching on Statistical Manifolds

TL;DR

Abstract

Paper Structure (43 sections, 2 theorems, 50 equations, 6 figures, 5 tables, 3 algorithms)

This paper contains 43 sections, 2 theorems, 50 equations, 6 figures, 5 tables, 3 algorithms.

Introduction
Preliminary
Information Geometry
Conditional Flow Matching on Riemannian Manifold
Method
Statistical Manifold of Categorical Distributions
Statistical Flow Matching
Optimization View of Statistical Flow Matching
Optimal Transport on Statistical Manifold
Exact Likelihood Calculation
Experiments
Toy Example: Swiss Roll on Simplex
Binarized MNIST
Text8
Promoter Design
...and 28 more sections

Key Result

Proposition 1

Figures (6)

Figure 1: The Riemannian geometry of the statistical manifold for categorical distributions in comparison to Euclidean geometry on the simplex. Left: Contours for the geodesic distances to $\mu_0=(1/3,1/3,1/3)$. Middle: Exponential maps (geodesics) from $\mu_0$ to different points near the boundary. Right: Logarithm maps (vector fields) to $\mu_0$.
Figure 2: Statistical flow matching (SFM) framework. (a) During training (Sec.\ref{['sec:sfm']}), probability measures on $\mathcal{P}$ are mapped to ${S^{n-1}_+}$ via diffeomorphism $\pi$ to compute the time-dependent vector field (in red). During inference, the learned vector field generates the trajectory on ${S^{n-1}_+}$ and we map the outcome of ODE back to $\mathcal{P}$ (in blue). (b) In the NLL calculation for one-hot examples (Sec.\ref{['sec:nll']}), the probability density is marginalized over a small neighborhood of some Dirac measure to avoid undefined behaviors at the boundary (in green).
Figure 3: Generated samples of the Swiss roll on simplex dataset and NLL (lower is better). The NLLs are estimated using Hutchinson's trace estimator, whereas those in the parenthesis are exact.
Figure 4: SP-MSE (as evaluated by Sei chen2022sequence) on the generated promoter DNA sequences. Results marked * are from avdeyev2023dirichlet and results marked † are from stark2024dirichlet.
Figure 5: GPT-J-6B NLL versus sample entropy. For MultiFlow, D3PM, and autoregressive, the curve represents different logit temperatures from 0.5 to 1. Baseline data are from campbell2024generative.
...and 1 more figures

Theorems & Definitions (3)

Proposition 1
Proposition 2
proof

Categorical Flow Matching on Statistical Manifolds

TL;DR

Abstract

Categorical Flow Matching on Statistical Manifolds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)