Table of Contents
Fetching ...

Symbolic Density Estimation: A Decompositional Approach

Angelo Rajendram, Xieting Chu, Vijay Ganesh, Max Fieg, Aishik Ghosh

Abstract

We introduce AI-Kolmogorov, a novel framework for Symbolic Density Estimation (SymDE). Symbolic regression (SR) has been effectively used to produce interpretable models in standard regression settings but its applicability to density estimation tasks has largely been unexplored. To address the SymDE task we introduce a multi-stage pipeline: (i) problem decomposition through clustering and/or probabilistic graphical model structure learning; (ii) nonparametric density estimation; (iii) support estimation; and finally (iv) SR on the density estimate. We demonstrate the efficacy of AI-Kolmogorov on synthetic mixture models, multivariate normal distributions, and three exotic distributions, two of which are motivated by applications in high-energy physics. We show that AI-Kolmogorov can discover underlying distributions or otherwise provide valuable insight into the mathematical expressions describing them.

Symbolic Density Estimation: A Decompositional Approach

Abstract

We introduce AI-Kolmogorov, a novel framework for Symbolic Density Estimation (SymDE). Symbolic regression (SR) has been effectively used to produce interpretable models in standard regression settings but its applicability to density estimation tasks has largely been unexplored. To address the SymDE task we introduce a multi-stage pipeline: (i) problem decomposition through clustering and/or probabilistic graphical model structure learning; (ii) nonparametric density estimation; (iii) support estimation; and finally (iv) SR on the density estimate. We demonstrate the efficacy of AI-Kolmogorov on synthetic mixture models, multivariate normal distributions, and three exotic distributions, two of which are motivated by applications in high-energy physics. We show that AI-Kolmogorov can discover underlying distributions or otherwise provide valuable insight into the mathematical expressions describing them.

Paper Structure

This paper contains 41 sections, 7 equations, 23 figures, 25 tables.

Figures (23)

  • Figure 1: The AI-Kolmogorov pipeline: decomposition (clustering/structure learning), density estimation, support estimation, symbolic regression, and warm-start refinement. Optional workflows are indicated by dashed arrows.
  • Figure 2: Ablation study on clustering and structure learning. Top: clustering results. Bottom: structure learning results.
  • Figure 3: Prediction (left) and residuals with respect to the ground truth (right) of the lowest loss expression. The maximum predicted density and absolute residual are 0.140 and 0.003 respectively.
  • Figure 4: Prediction and regions for local probability mass validation indicated as A & B (left). Residuals with respect to KDE (right) of the lowest loss expression. The maximum predicted density is 3.677, and the maximum absolute residual is 0.640.
  • Figure 5: Prediction and regions for local probability mass validation indicated as A & B (left). Residuals with respect to NSF (right) of the lowest loss expression. The maximum predicted density is 4555.25 and the maximum absolute residual is 1789.14.
  • ...and 18 more figures