Table of Contents
Fetching ...

$α$-Flow: A Unified Framework for Continuous-State Discrete Flow Matching Models

Chaoran Cheng, Jiahan Li, Jiajun Fan, Ge Liu

TL;DR

This work presents a unified framework for CS-DFM models, under which the existing variants can be understood as operating on different $\alpha$-representations of probabilities, and introduces $\alpha$-Flow, a family of CS-DFM models that adheres to the canonical $\alpha$-geometry of the statistical manifold, and demonstrates its optimality in minimizing the generalized kinetic energy.

Abstract

Recent efforts have extended the flow-matching framework to discrete generative modeling. One strand of models directly works with the continuous probabilities instead of discrete tokens, which we colloquially refer to as Continuous-State Discrete Flow Matching (CS-DFM). Existing CS-DFM models differ significantly in their representations and geometric assumptions. This work presents a unified framework for CS-DFM models, under which the existing variants can be understood as operating on different $α$-representations of probabilities. Building upon the theory of information geometry, we introduce $α$-Flow, a family of CS-DFM models that adheres to the canonical $α$-geometry of the statistical manifold, and demonstrate its optimality in minimizing the generalized kinetic energy. Theoretically, we show that the flow matching loss for $α$-flow establishes a unified variational bound for the discrete negative log-likelihood. We comprehensively evaluate different instantiations of $α$-flow on various discrete generation domains to demonstrate their effectiveness in discrete generative modeling, including intermediate values whose geometries have never been explored before. $α$-flow significantly outperforms its discrete-state counterpart in image and protein sequence generation and better captures the entropy in language modeling.

$α$-Flow: A Unified Framework for Continuous-State Discrete Flow Matching Models

TL;DR

This work presents a unified framework for CS-DFM models, under which the existing variants can be understood as operating on different -representations of probabilities, and introduces -Flow, a family of CS-DFM models that adheres to the canonical -geometry of the statistical manifold, and demonstrates its optimality in minimizing the generalized kinetic energy.

Abstract

Recent efforts have extended the flow-matching framework to discrete generative modeling. One strand of models directly works with the continuous probabilities instead of discrete tokens, which we colloquially refer to as Continuous-State Discrete Flow Matching (CS-DFM). Existing CS-DFM models differ significantly in their representations and geometric assumptions. This work presents a unified framework for CS-DFM models, under which the existing variants can be understood as operating on different -representations of probabilities. Building upon the theory of information geometry, we introduce -Flow, a family of CS-DFM models that adheres to the canonical -geometry of the statistical manifold, and demonstrate its optimality in minimizing the generalized kinetic energy. Theoretically, we show that the flow matching loss for -flow establishes a unified variational bound for the discrete negative log-likelihood. We comprehensively evaluate different instantiations of -flow on various discrete generation domains to demonstrate their effectiveness in discrete generative modeling, including intermediate values whose geometries have never been explored before. -flow significantly outperforms its discrete-state counterpart in image and protein sequence generation and better captures the entropy in language modeling.

Paper Structure

This paper contains 45 sections, 11 theorems, 72 equations, 7 figures, 4 tables, 2 algorithms.

Key Result

Theorem 3.2

For all $\alpha\in[-1,1]$, the loss $\mathcal{L}^{(\alpha)}$ establishes a negative evidence lower bound (ELBO) for the discrete negative log-likelihood: where $\delta_1$ is the target one-hot distribution, $\gamma^{(\alpha)}$ is the $\alpha$-geodesic connecting $\mu_0$ to $\delta_1$, and $C$ is a non-negative constant that does not rely on the model parameter $\theta$.

Figures (7)

  • Figure 1: $\alpha$-geodesics defined by the exponential map (left) of the same base point and vector field, and the interpolation (right) between two fixed points on the 2-simplex.
  • Figure 2: Estimated densities using different variants of $\alpha$-flow and KL divergence to the ground truth density estimation.
  • Figure 3: Uncurated generated digits and FID scores (lower is better) on the binarized MNIST dataset.
  • Figure 4: pLDDT vs FED scores for different variants of $\alpha$-flow and MDLM (the best DS-DFM model).
  • Figure 5: Different modified $\alpha$-representations of a Bernoulli distribution.
  • ...and 2 more figures

Theorems & Definitions (23)

  • Definition 3.1: $\alpha$-representation
  • Theorem 3.2: Negative ELBO
  • Proposition 3.3
  • proof : Proof for Theorem \ref{['thm:elbo']}
  • Definition 3.4: $\alpha$-divergence
  • Proposition 3.5
  • Theorem 3.6: local optimality, bauer2024p, Corollary 3.11
  • Corollary 3.7
  • Theorem 3.8: global optimality
  • Corollary 3.9
  • ...and 13 more