MESSY Estimation: Maximum-Entropy based Stochastic and Symbolic densitY Estimation

Tony Tohme; Mohsen Sadr; Kamal Youcef-Toumi; Nicolas G. Hadjiconstantinou

MESSY Estimation: Maximum-Entropy based Stochastic and Symbolic densitY Estimation

Tony Tohme, Mohsen Sadr, Kamal Youcef-Toumi, Nicolas G. Hadjiconstantinou

TL;DR

MESSY estimation recasts maximum-entropy density estimation as a linear inverse problem using a gradient flow driven by the grad-log of a MED guess, enabling efficient recovery of a tractable symbolic density from samples. It introduces a Monte Carlo symbolic regression to optimize basis functions, a multi-level recursive density-recovery scheme, and an ensemble-averaged, orthonormalized framework with cross-entropy corrections to maintain low bias and good conditioning. The approach yields accurate density estimates for challenging cases (bi-modal, discontinuous, near realizability) with favorable scaling in sample size and moderate dimensionality, and produces compact symbolic representations of the density. By combining linear-inversion MED with symbolic basis search and multi-level modeling, MESSY offers a practical and expressive density-recovery tool with potential applications to data-driven dynamics and model closure.

Abstract

We introduce MESSY estimation, a Maximum-Entropy based Stochastic and Symbolic densitY estimation method. The proposed approach recovers probability density functions symbolically from samples using moments of a Gradient flow in which the ansatz serves as the driving force. In particular, we construct a gradient-based drift-diffusion process that connects samples of the unknown distribution function to a guess symbolic expression. We then show that when the guess distribution has the maximum entropy form, the parameters of this distribution can be found efficiently by solving a linear system of equations constructed using the moments of the provided samples. Furthermore, we use Symbolic regression to explore the space of smooth functions and find optimal basis functions for the exponent of the maximum entropy functional leading to good conditioning. The cost of the proposed method for each set of selected basis functions is linear with the number of samples and quadratic with the number of basis functions. However, the underlying acceptance/rejection procedure for finding optimal and well-conditioned bases adds to the computational cost. We validate the proposed MESSY estimation method against other benchmark methods for the case of a bi-modal and a discontinuous density, as well as a density at the limit of physical realizability. We find that the addition of a symbolic search for basis functions improves the accuracy of the estimation at a reasonable additional computational cost. Our results suggest that the proposed method outperforms existing density recovery methods in the limit of a small to moderate number of samples by providing a low-bias and tractable symbolic description of the unknown density at a reasonable computational cost.

MESSY Estimation: Maximum-Entropy based Stochastic and Symbolic densitY Estimation

TL;DR

Abstract

Paper Structure (26 sections, 2 theorems, 38 equations, 15 figures, 1 table, 5 algorithms)

This paper contains 26 sections, 2 theorems, 38 equations, 15 figures, 1 table, 5 algorithms.

Introduction
Related methods.
Our Contributions.
Gradient flow and theoretical motivation
Ansatz as the target density of Gradient flow
Maximum Entropy Distribution as an ansatz for the gradient flow
Orthonormalizing the basis functions.
Comparing the proposed formulation to standard Maximum Entropy Distribution
Symbolic-Based Maximum Entropy Distribution
Multi-level density recovery
Algorithm for MESSY estimation
Results
Bi-modal distribution function
Limit of realizability
Discontinuous distributions
...and 11 more sections

Key Result

Proposition 3.1

The distribution function $f(t)$ governed by the Fokker-Planck Eq. (eq:FP_fXdf) converges to $\hat{f}$ as $t \rightarrow \infty$. Furthermore, the cross entropy distance between $f$ and $\hat{f}$ monotonically decreases during this transition.

Figures (15)

Figure 1: Expression tree for $x^2 \times \cos(x)$.
Figure 2: Density estimation using KDE, MxED, MESSY-P, and MESSY-S given (a) 100, (b) 1,000, and (c) 10,000 samples.
Figure 3: Comparing the relative error in (a) the first four moments, (b) two higher order moments (i.e. fifth and sixth moments), (c) KL Divergence, and (d) the execution time for KDE, MxED, MESSY-P, and MESSY-S in recovering distribution function for different sample sizes. Here, the error bar (in black) corresponds to the standard error of the empirical measurements.
Figure 4: Convergence of MESSY estimation to target distribution function by (a) increasing the order of polynomial basis functions for MESSY-P or (b) increasing the number of randomly selected symbolic basis functions with $N_m=2$ for MESSY-S.
Figure 5: KL Divergence, execution time, and condition number against the degrees of freedom, i.e. the order of polynomial basis functions for MESSY-P or the number of symbolic basis functions with $N_m=2$ for the MESSY-S estimate.
...and 10 more figures

Theorems & Definitions (10)

Proposition 3.1
proof
Definition 4.1
Definition 4.2
Definition 4.3
Definition 4.4
Proposition 4.5
proof
Definition 5.1
Definition 5.2

MESSY Estimation: Maximum-Entropy based Stochastic and Symbolic densitY Estimation

TL;DR

Abstract

MESSY Estimation: Maximum-Entropy based Stochastic and Symbolic densitY Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (10)