Table of Contents
Fetching ...

Energy-Based Modelling for Discrete and Mixed Data via Heat Equations on Structured Spaces

Tobias Schröder, Zijing Ou, Yingzhen Li, Andrew B. Duncan

TL;DR

This work addresses the challenge of training energy-based models on discrete and mixed-state data by introducing Energy Discrepancy (ED), a contrasting loss that requires only energy evaluations on data and perturbed samples, thereby removing the need for MCMC. ED relies on discrete diffusion on structured spaces, implemented as a heat equation on graphs with rate matrices $R$ (e.g., uniform, cyclical, ordinal, and absorbing structures), and provides a MCMC-free route to maximum-likelihood-like training as $t$ grows. The authors extend ED to tabular data by combining geometric perturbations for discrete features with Gaussian noise for numeric features, and demonstrate strong performance across discrete density estimation, tabular data synthesis (including calibration tasks), and discrete image modelling, often with lower computational cost than CD-based methods. The approach yields robust generation and improved calibration on real-world datasets, suggesting practical impact for synthetic data, data imputation, and calibrated classification in tabular and mixed data domains. The work also offers theoretical guarantees linking ED to ML-estimation limits and provides scalable, parallelizable procedures via eigen-decompositions of structure-specific rate matrices. $$p_{\theta}(x) \propto \exp(-U_{\theta}(x))$$ and the ED loss $$\mathrm{ED}_q(p_{\mathrm{data}},U) = \mathbb{E}_{p_{\ m data}(x)}[U(x)] - \mathbb{E}_{p_{\rm data}(x)}\mathbb{E}_{q(y|x)}[U_q(y)],$$ with $U_q(y) = -\log \sum_{x'} q(y|x') e^{-U(x')}$$ are central to the methodology.

Abstract

Energy-based models (EBMs) offer a flexible framework for probabilistic modelling across various data domains. However, training EBMs on data in discrete or mixed state spaces poses significant challenges due to the lack of robust and fast sampling methods. In this work, we propose to train discrete EBMs with Energy Discrepancy, a loss function which only requires the evaluation of the energy function at data points and their perturbed counterparts, thus eliminating the need for Markov chain Monte Carlo. We introduce perturbations of the data distribution by simulating a diffusion process on the discrete state space endowed with a graph structure. This allows us to inform the choice of perturbation from the structure of the modelled discrete variable, while the continuous time parameter enables fine-grained control of the perturbation. Empirically, we demonstrate the efficacy of the proposed approaches in a wide range of applications, including the estimation of discrete densities with non-binary vocabulary and binary image modelling. Finally, we train EBMs on tabular data sets with applications in synthetic data generation and calibrated classification.

Energy-Based Modelling for Discrete and Mixed Data via Heat Equations on Structured Spaces

TL;DR

This work addresses the challenge of training energy-based models on discrete and mixed-state data by introducing Energy Discrepancy (ED), a contrasting loss that requires only energy evaluations on data and perturbed samples, thereby removing the need for MCMC. ED relies on discrete diffusion on structured spaces, implemented as a heat equation on graphs with rate matrices (e.g., uniform, cyclical, ordinal, and absorbing structures), and provides a MCMC-free route to maximum-likelihood-like training as grows. The authors extend ED to tabular data by combining geometric perturbations for discrete features with Gaussian noise for numeric features, and demonstrate strong performance across discrete density estimation, tabular data synthesis (including calibration tasks), and discrete image modelling, often with lower computational cost than CD-based methods. The approach yields robust generation and improved calibration on real-world datasets, suggesting practical impact for synthetic data, data imputation, and calibrated classification in tabular and mixed data domains. The work also offers theoretical guarantees linking ED to ML-estimation limits and provides scalable, parallelizable procedures via eigen-decompositions of structure-specific rate matrices. and the ED loss with $ are central to the methodology.

Abstract

Energy-based models (EBMs) offer a flexible framework for probabilistic modelling across various data domains. However, training EBMs on data in discrete or mixed state spaces poses significant challenges due to the lack of robust and fast sampling methods. In this work, we propose to train discrete EBMs with Energy Discrepancy, a loss function which only requires the evaluation of the energy function at data points and their perturbed counterparts, thus eliminating the need for Markov chain Monte Carlo. We introduce perturbations of the data distribution by simulating a diffusion process on the discrete state space endowed with a graph structure. This allows us to inform the choice of perturbation from the structure of the modelled discrete variable, while the continuous time parameter enables fine-grained control of the perturbation. Empirically, we demonstrate the efficacy of the proposed approaches in a wide range of applications, including the estimation of discrete densities with non-binary vocabulary and binary image modelling. Finally, we train EBMs on tabular data sets with applications in synthetic data generation and calibrated classification.

Paper Structure

This paper contains 34 sections, 6 theorems, 52 equations, 12 figures, 12 tables, 2 algorithms.

Key Result

Theorem 1

Let $q_t(\cdot\vert x)$ be a Markov transition density defined by the rate matrix $R$ with eigenvalues $0 = \lambda_1(R) \geq \lambda_2(R) \geq \dots \geq\lambda_S(R)$ and uniform stationary distribution. Then, there exists a constant $z_t$independent of $\theta$ such that energy-discrepancy conver with the loss of maximum-likelihood estimation $\mathcal{L}_{\mathrm{MLE}}(\theta) := -\mathbb E_{p

Figures (12)

  • Figure 1: Visualisation of a typical state space of a tabular dataset: Numerical entries taking values in $\mathbb R^d$, cyclical categorical entries (e.g. season), ordinal categorical entries (e.g. age), unstructured categorical entries, and variables with an absorbing state associated with masking the entry.
  • Figure 2: Comparison of energy discrepancy and contrastive divergence on the dataset with $16$ dimensions and $5$ states. Rows $1$ and $2$ show the estimated density and synthesised samples, respectively.
  • Figure 3: Comparison of the energy discrepancy and contrastive divergence on the synthetic tabular datasets.
  • Figure 4: Calibration results comparison between the baseline (left) and energy discrepancy (right) on the adult dataset.
  • Figure 5: Scaling limit of the introduced perturbations. Top: Convergence of rescaled cyclical and ordinal perturbations $y_{S^2t}/S$ for base time parameters $t= 0.01$ and $t= 0.05$ to Gaussian $[0, 1)$ with non-trivial boundary conditions. One can see that the perturbation converges to a fixed shape on the normalised state space. Bottom: Convergence of rescaled cyclical and ordinal perturbation $(y_{St} - \mathbb E[y_{St}])/\sqrt{S}$ for base time parameters $t= 0.1$ and $t= 0.5$ to Gaussian on $\mathbb R$ (red line). The orange mark indicates the initial state. One can see that the perturbation remains non-trivial as the state space grows to infinity at rate $\sqrt{S}$.
  • ...and 7 more figures

Theorems & Definitions (10)

  • Definition 1: Energy Discrepancy
  • Theorem 1
  • Proposition 1
  • Theorem 2: Scaling limit
  • Theorem 2
  • proof
  • Proposition 1
  • proof
  • Theorem 2: Scaling limit
  • proof