Fisher Flow Matching for Generative Modeling over Discrete Data

Oscar Davis; Samuel Kessler; Mircea Petrache; İsmail İlkan Ceylan; Michael Bronstein; Avishek Joey Bose

Fisher Flow Matching for Generative Modeling over Discrete Data

Oscar Davis, Samuel Kessler, Mircea Petrache, İsmail İlkan Ceylan, Michael Bronstein, Avishek Joey Bose

TL;DR

It is proved that the gradient flow induced by Fisher-Flow is optimal in reducing the forward KL divergence, and the learned flows in Fisher-Flow can be further bootstrapped by leveraging Riemannian optimal transport leading to improved training dynamics.

Abstract

Generative modeling over discrete data has recently seen numerous success stories, with applications spanning language modeling, biological sequence design, and graph-structured molecular data. The predominant generative modeling paradigm for discrete data is still autoregressive, with more recent alternatives based on diffusion or flow-matching falling short of their impressive performance in continuous data settings, such as image or video generation. In this work, we introduce Fisher-Flow, a novel flow-matching model for discrete data. Fisher-Flow takes a manifestly geometric perspective by considering categorical distributions over discrete data as points residing on a statistical manifold equipped with its natural Riemannian metric: the $\textit{Fisher-Rao metric}$. As a result, we demonstrate discrete data itself can be continuously reparameterised to points on the positive orthant of the $d$-hypersphere $\mathbb{S}^d_+$, which allows us to define flows that map any source distribution to target in a principled manner by transporting mass along (closed-form) geodesics of $\mathbb{S}^d_+$. Furthermore, the learned flows in Fisher-Flow can be further bootstrapped by leveraging Riemannian optimal transport leading to improved training dynamics. We prove that the gradient flow induced by Fisher-Flow is optimal in reducing the forward KL divergence. We evaluate Fisher-Flow on an array of synthetic and diverse real-world benchmarks, including designing DNA Promoter, and DNA Enhancer sequences. Empirically, we find that Fisher-Flow improves over prior diffusion and flow-matching models on these benchmarks.

Fisher Flow Matching for Generative Modeling over Discrete Data

TL;DR

Abstract

. As a result, we demonstrate discrete data itself can be continuously reparameterised to points on the positive orthant of the

-hypersphere

, which allows us to define flows that map any source distribution to target in a principled manner by transporting mass along (closed-form) geodesics of

. Furthermore, the learned flows in Fisher-Flow can be further bootstrapped by leveraging Riemannian optimal transport leading to improved training dynamics. We prove that the gradient flow induced by Fisher-Flow is optimal in reducing the forward KL divergence. We evaluate Fisher-Flow on an array of synthetic and diverse real-world benchmarks, including designing DNA Promoter, and DNA Enhancer sequences. Empirically, we find that Fisher-Flow improves over prior diffusion and flow-matching models on these benchmarks.

Paper Structure (35 sections, 4 theorems, 26 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 35 sections, 4 theorems, 26 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Background
Information geometry
Flow matching over Riemannian manifolds
Fisher Flow Matching
Reparameterising discrete data on the simplex
Flow Matching from $\mathring{\Delta}^d \to \mathbb{S}^d_+$ via the sphere map
The Fisher-Rao metric from Natural gradient descent
Fisher-Flow Matching with Riemannian optimal transport
Training Fisher-Flow
Experiments
Synthetic experiments
Promoter DNA sequence design
Enhancer DNA design
De novo molecule generation
...and 20 more sections

Key Result

Proposition 1

Assume that there exists a bounded Riemannian metric $g$ over $\Delta^d$ such that the parameterisation map $\theta\mapsto p=p(\theta)$ is Lipschitz and differentiable from $\Theta$ to $({\mathcal{P}}({\mathcal{M}}), W_{2,g})$. Then the "natural gradient" descent of the form: approximates, as $\epsilon\to 0^+$, the gradient flow of $\mathcal{L}$ on manifold $({\mathcal{P}}({\mathcal{M}}^d), W_{g_

Figures (5)

Figure 1: A geodesic connecting $x_0$ and $x_1$ using the FR metric on $\mathring{\Delta}^2$ and the corresponding path on $\mathbb{S}^2_+$.
Figure 2: Synthetic experiments on learning a distribution resembling a smiley face on $\mathring{\Delta^2}$.
Figure 3: Toy experiment from stark2024dirichlet. Minimal KL divergence over 5 seeds is reported.
Figure 3: Results on QM9. Higher is better. The baseline numbers are taken from the cited papers. The numbers reported for FlowMol are those for the uniform distribution and end-point parameterisation. Our numbers are for $1{,}000$ molecules.
Figure 4: Generated molecules using Fisher-Flow on QM9.

Theorems & Definitions (6)

Proposition 1
Proposition 2
Proposition 3: extended version of \ref{['prop:fisher_kl']}
proof
Proposition 4: extended version of \ref{['prop:mongegeodesic']}
proof

Fisher Flow Matching for Generative Modeling over Discrete Data

TL;DR

Abstract

Fisher Flow Matching for Generative Modeling over Discrete Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)