Table of Contents
Fetching ...

Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching

Etrit Haxholli, Yeti Z. Gürbüz, Oğul Can, Eli Waxman

TL;DR

The paper tackles the challenge of modeling discrete, categorical data with flow-based methods, where non-deterministic discrete paths prevent straightforward rectification and precise likelihood estimation. It introduces a dynamic optimal-transport objective for discrete flows with convex interpolants, and proves a Kantorovich formulation that yields a categorical Benamou–Brenier-type theorem, with costs defined by inter-state similarity. Two practical perplexity bounds are derived to enable principled training and model comparison, including a KL-based bound and an entropy-based bound that generalize prior discrete-diffusion bounds; these bounds support evaluation and guide optimization. The authors further present Multimask Flows and show that minibatch OT reduces required inference steps by up to 8x on GPT-2–sized models while preserving diversity, enabling scalable, efficient discrete generation. Empirical results on small proofs of concept and OpenWebText-scale tasks demonstrate substantial jumps-reduction and competitive perplexity across settings, validating the proposed framework and bounds as practical tools for discrete-flow modeling and comparison with autoregressive and discrete diffusion baselines.

Abstract

Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectification strategy cannot be applied due to the stochasticity of discrete paths, necessitating alternative methods to minimize state transitions. We propose a dynamic-optimal-transport-like minimization objective and derive its Kantorovich formulation for discrete flows with convex interpolants, where transport cost depends solely on inter-state similarity and can be optimized via minibatch strategies. In the case of bag-of-words (BoW) sourced flows, we show that such methods can reduce the number of transitions up to 8 times (1024 to 128) to reach the same generative perplexity without compromising diversity. Additionally, path nondeterminism in discrete flows precludes an instantaneous change-of-variables analogue, preventing precise probability estimation available to continuous flows. We therefore propose two upper bounds on perplexity, enabling principled training, evaluation and model comparison. Finally, we introduce Multimask Flows which outperform masked flows in generative perplexity, particularly when utilizing minibatch Optimal Transport, without sacrificing diversity.

Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching

TL;DR

The paper tackles the challenge of modeling discrete, categorical data with flow-based methods, where non-deterministic discrete paths prevent straightforward rectification and precise likelihood estimation. It introduces a dynamic optimal-transport objective for discrete flows with convex interpolants, and proves a Kantorovich formulation that yields a categorical Benamou–Brenier-type theorem, with costs defined by inter-state similarity. Two practical perplexity bounds are derived to enable principled training and model comparison, including a KL-based bound and an entropy-based bound that generalize prior discrete-diffusion bounds; these bounds support evaluation and guide optimization. The authors further present Multimask Flows and show that minibatch OT reduces required inference steps by up to 8x on GPT-2–sized models while preserving diversity, enabling scalable, efficient discrete generation. Empirical results on small proofs of concept and OpenWebText-scale tasks demonstrate substantial jumps-reduction and competitive perplexity across settings, validating the proposed framework and bounds as practical tools for discrete-flow modeling and comparison with autoregressive and discrete diffusion baselines.

Abstract

Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectification strategy cannot be applied due to the stochasticity of discrete paths, necessitating alternative methods to minimize state transitions. We propose a dynamic-optimal-transport-like minimization objective and derive its Kantorovich formulation for discrete flows with convex interpolants, where transport cost depends solely on inter-state similarity and can be optimized via minibatch strategies. In the case of bag-of-words (BoW) sourced flows, we show that such methods can reduce the number of transitions up to 8 times (1024 to 128) to reach the same generative perplexity without compromising diversity. Additionally, path nondeterminism in discrete flows precludes an instantaneous change-of-variables analogue, preventing precise probability estimation available to continuous flows. We therefore propose two upper bounds on perplexity, enabling principled training, evaluation and model comparison. Finally, we introduce Multimask Flows which outperform masked flows in generative perplexity, particularly when utilizing minibatch Optimal Transport, without sacrificing diversity.

Paper Structure

This paper contains 43 sections, 9 theorems, 196 equations, 17 tables, 3 algorithms.

Key Result

Theorem 3.1

Let $\pi(x_0, x_1)$ be the joint distribution of $x_0$ and $x_1$, and let $p_t$ be a flow defined as in Equations (main_flow_def, pos_independ, convex_cond_flow) that transforms $p=\int \pi(x_0, x_1) dx_1$ into $q=\int \pi(x_0, x_1) dx_0$. In this setting, the dynamic formulation given in Equation ( where the cost function is $c(x_0,x_1)=\sum_{i=1}^L s(x_0^i, x_1^i)$.

Theorems & Definitions (11)

  • Theorem 3.1
  • Corollary 3.2
  • Corollary 3.3
  • Remark 3.4: On novelty and contribution
  • Theorem 4.1
  • Theorem 4.2
  • Proposition 4.3
  • Proposition 4.4
  • Proposition A.1
  • Proposition A.2
  • ...and 1 more