Discrete distributions are learnable from metastable samples

Abhijith Jayakumar; Andrey Y. Lokhov; Sidhant Misra; Marc Vuffray

Discrete distributions are learnable from metastable samples

Abhijith Jayakumar, Andrey Y. Lokhov, Sidhant Misra, Marc Vuffray

TL;DR

The paper tackles learning the stationary distribution of a high-dimensional discrete system when observed data come from metastable regions where a Markov chain mixes slowly. It introduces two metastability notions, including $\\eta$-strong metastability, and proves that metastable states have single-variable conditionals that are close to those of the true stationary distribution in average TV distance. Leveraging this, the authors show that conditional likelihood-based learning, particularly pseudo-likelihood, can recover near-optimal estimates of the energy function and model parameters from metastable data, with explicit bounds that depend on the chain's conductance and flip-bound parameters. They provide concrete results for Ising models and demonstrate the approach numerically on the Curie-Weiss model, where PL learns the correct parameters despite data being drawn from metastable samples, whereas MLE fails. The work bridges statistical physics and learning theory, enabling robust learning in slow-mixing regimes and suggesting extensions to broader energy-based models and neural parametrizations.

Abstract

Physically motivated stochastic dynamics are often used to sample from high-dimensional distributions. However such dynamics often get stuck in specific regions of their state space and mix very slowly to the desired stationary state. This causes such systems to approximately sample from a metastable distribution which is usually quite different from the desired, stationary distribution of the dynamic. We rigorously show that, in the case of multi-variable discrete distributions, the true model describing the stationary distribution can be recovered from samples produced from a metastable distribution under minimal assumptions about the system. This follows from a fundamental observation that the single-variable conditionals of metastable distributions that satisfy a strong metastability condition are on average close to those of the stationary distribution. This holds even when the metastable distribution differs considerably from the true model in terms of global metrics like Kullback-Leibler divergence or total variation distance. This property allows us to learn the true model using a conditional likelihood based estimator, even when the samples come from a metastable distribution concentrated in a small region of the state space. Explicit examples of such metastable states can be constructed from regions that effectively bottleneck the probability flow and cause poor mixing of the Markov chain. For specific cases of binary pairwise undirected graphical models (i.e. Ising models), we extend our results to further rigorously show that data coming from metastable states can be used to learn the parameters of the energy function and recover the structure of the model.

Discrete distributions are learnable from metastable samples

TL;DR

Abstract

Discrete distributions are learnable from metastable samples

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (36)