Table of Contents
Fetching ...

Meta-Statistical Learning: Supervised Learning of Statistical Estimators

Maxime Peyrard, Kyunghyun Cho

TL;DR

This work reframes the design of statistical estimators as an amortized supervised learning problem over a meta-distribution of data-generating processes. By using permutation-invariant encoders such as Set Transformer, the framework learns estimators that optimize frequentist properties and can emulate Bayesian ideas without committing to a fixed prior. The authors validate the approach on two tasks—normality testing (classification) and mutual information estimation (regression)—demonstrating strong out-of-distribution generalization and substantial efficiency gains, even with compact models. The study highlights a path toward automated discovery of generalizable statistical estimators and outlines both practical benefits and key limitations for future exploration.

Abstract

Statistical inference, a central tool of science, revolves around the study and the usage of statistical estimators: functions that map finite samples to predictions about unknown distribution parameters. In the frequentist framework, estimators are evaluated based on properties such as bias, variance (for parameter estimation), accuracy, power, and calibration (for hypothesis testing). However, crafting estimators with desirable properties is often analytically challenging, and sometimes impossible, e.g., there exists no universally unbiased estimator for the standard deviation. In this work, we introduce meta-statistical learning, an amortized learning framework that recasts estimator design as an optimization problem via supervised learning. This takes a fully empirical approach to discovering statistical estimators; entire datasets are input to permutation-invariant neural networks, such as Set Transformers, trained to predict the target statistical property. The trained model is the estimator, and can be analyzed through the classical frequentist lens. We demonstrate the approach on two tasks: learning a normality test (classification) and estimating mutual information (regression), achieving strong results even with small models. Looking ahead, this paradigm opens a path to automate the discovery of generalizable and flexible statistical estimators.

Meta-Statistical Learning: Supervised Learning of Statistical Estimators

TL;DR

This work reframes the design of statistical estimators as an amortized supervised learning problem over a meta-distribution of data-generating processes. By using permutation-invariant encoders such as Set Transformer, the framework learns estimators that optimize frequentist properties and can emulate Bayesian ideas without committing to a fixed prior. The authors validate the approach on two tasks—normality testing (classification) and mutual information estimation (regression)—demonstrating strong out-of-distribution generalization and substantial efficiency gains, even with compact models. The study highlights a path toward automated discovery of generalizable statistical estimators and outlines both practical benefits and key limitations for future exploration.

Abstract

Statistical inference, a central tool of science, revolves around the study and the usage of statistical estimators: functions that map finite samples to predictions about unknown distribution parameters. In the frequentist framework, estimators are evaluated based on properties such as bias, variance (for parameter estimation), accuracy, power, and calibration (for hypothesis testing). However, crafting estimators with desirable properties is often analytically challenging, and sometimes impossible, e.g., there exists no universally unbiased estimator for the standard deviation. In this work, we introduce meta-statistical learning, an amortized learning framework that recasts estimator design as an optimization problem via supervised learning. This takes a fully empirical approach to discovering statistical estimators; entire datasets are input to permutation-invariant neural networks, such as Set Transformers, trained to predict the target statistical property. The trained model is the estimator, and can be analyzed through the classical frequentist lens. We demonstrate the approach on two tasks: learning a normality test (classification) and estimating mutual information (regression), achieving strong results even with small models. Looking ahead, this paradigm opens a path to automate the discovery of generalizable and flexible statistical estimators.
Paper Structure (43 sections, 1 theorem, 33 equations, 10 figures, 4 tables)

This paper contains 43 sections, 1 theorem, 33 equations, 10 figures, 4 tables.

Key Result

Proposition 1

The MSE of any estimator $f(X)$ admits the decomposition:

Figures (10)

  • Figure 1: Illustration of Meta-Statistical Learning
  • Figure 2: OOMD evaluation of normality tests as a function of the input dataset size against the two best baselines: Shapiro-Wilk and Lilliefors. (Red line: training cut-off)
  • Figure 3: OOMD consistency of MI estimators: MSE as a function of the input dataset size. (Red line: training cut-off)
  • Figure 4: Training curves: Comparison of training convergence of meta-statistical models on the correlation task.
  • Figure 5: Generalization Across Dataset Lengths and Meta-Distributions. For each subplot, the left panel illustrates the performance of meta-statistical models on test datasets that vary in input length, including lengths not observed during training, while remaining within the training meta-distribution. For each subplot, the right panel presents the same comparison but for test datasets sampled from entirely new meta-distributions, with distributions unseen during training. Note that LSTM is excluded because its errors are an order of magnitude higher.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 1: Conditional MSE decomposition into frequentists bias and variance
  • proof