Table of Contents
Fetching ...

Exponential Family Variational Flow Matching for Tabular Data Generation

Andrés Guzmán-Cordero, Floor Eijkelboom, Jan-Willem van de Meent

TL;DR

TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation, is developed and an efficient, data-driven objective based on moment matching is obtained, enabling principled learning of probability paths over mixed continuous and discrete variables.

Abstract

While denoising diffusion and flow matching have driven major advances in generative modeling, their application to tabular data remains limited, despite its ubiquity in real-world applications. To this end, we develop TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation. To apply VFM to data with mixed continuous and discrete features, we introduce Exponential Family Variational Flow Matching (EF-VFM), which represents heterogeneous data types using a general exponential family distribution. We hereby obtain an efficient, data-driven objective based on moment matching, enabling principled learning of probability paths over mixed continuous and discrete variables. We also establish a connection between variational flow matching and generalized flow matching objectives based on Bregman divergences. Evaluation on tabular data benchmarks demonstrates state-of-the-art performance compared to baselines.

Exponential Family Variational Flow Matching for Tabular Data Generation

TL;DR

TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation, is developed and an efficient, data-driven objective based on moment matching is obtained, enabling principled learning of probability paths over mixed continuous and discrete variables.

Abstract

While denoising diffusion and flow matching have driven major advances in generative modeling, their application to tabular data remains limited, despite its ubiquity in real-world applications. To this end, we develop TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation. To apply VFM to data with mixed continuous and discrete features, we introduce Exponential Family Variational Flow Matching (EF-VFM), which represents heterogeneous data types using a general exponential family distribution. We hereby obtain an efficient, data-driven objective based on moment matching, enabling principled learning of probability paths over mixed continuous and discrete variables. We also establish a connection between variational flow matching and generalized flow matching objectives based on Bregman divergences. Evaluation on tabular data benchmarks demonstrates state-of-the-art performance compared to baselines.

Paper Structure

This paper contains 40 sections, 4 theorems, 37 equations, 1 figure, 9 tables.

Key Result

Proposition 3.1

Let $q_t^{\theta}(x_1 \mid x)$ be a variational distribution from an exponential family, parameterized by natural parameters $\eta^{\theta}_t(x)$, which depend on neural network parameters $\theta$. The gradient of the VFM objective $\nabla_{\theta} \mathcal{L}(\theta)$ is: where $\mu_t(x) = \mathbb{E}_{p_t(x_1 \mid x)}[\tau(x_1)]$ are the moments relative to $p_t(x_1 \mid x)$, and $\mu_t^\theta(

Figures (1)

  • Figure 1: Exponential Family Variational Flow Matching (EF-VFM) is a generative modeling framework designed for mixed continuous and discrete variables. By leveraging the exponential family and a mean-field assumption, EF-VFM efficiently matches the sufficient statistics of the distributions via learned probability paths, ensuring state-of-the-art fidelity and diversity in synthetic data.

Theorems & Definitions (8)

  • Proposition 3.1
  • proof
  • Proposition 3.2
  • proof
  • Proposition A.1
  • proof
  • Proposition A.1
  • proof