Table of Contents
Fetching ...

MLE convergence speed to information projection of exponential family: Criterion for model dimension and sample size -- complete proof version--

Yo Sheena

TL;DR

This work analyzes how quickly the maximum likelihood estimator converges to the information projection within an exponential-family framework by deriving the estimation-risk expansion in KL divergence up to order $n^{-2}$. The leading term scales with model complexity (roughly $p/(2n)$ in well-specified exponential-family settings), while the second-order term incorporates higher-order cumulants, enabling a precise assessment of when the MLE is close to the closest model distribution. The authors propose the $p-n$ criterion to decide whether the current sample size $n$ and model dimension $p$ suffice, and provide an algorithm and practical calibration (via a Bayes-error bound) for exponential families. They illustrate the approach with real datasets (e.g., wine quality and abalone) to guide model compression and connect the framework to information criteria such as TIC/AIC, highlighting its use in model acceptance and complexity control.

Abstract

For a parametric model of distributions, the closest distribution in the model to the true distribution located outside the model is considered. Measuring the closeness between two distributions with the Kullback-Leibler (K-L) divergence, the closest distribution is called the "information projection." The estimation risk of the maximum likelihood estimator (MLE) is defined as the expectation of K-L divergence between the information projection and the predictive distribution with plugged-in MLE. Here, the asymptotic expansion of the risk is derived up to $n^{-2}$-order, and the sufficient condition on the risk for the Bayes error rate between the true distribution and the information projection to be lower than a specified value is investigated. Combining these results, the "$p-n$ criterion" is proposed, which determines whether the MLE is sufficiently close to the information projection for the given model and sample. In particular, the criterion for an exponential family model is relatively simple and can be used for a complex model with no explicit form of normalizing constant. This criterion can constitute a solution to the sample size or model acceptance problem. Use of the $p-n$ criteria is demonstrated for two practical datasets. The relationship between the results and information criteria is also studied.

MLE convergence speed to information projection of exponential family: Criterion for model dimension and sample size -- complete proof version--

TL;DR

This work analyzes how quickly the maximum likelihood estimator converges to the information projection within an exponential-family framework by deriving the estimation-risk expansion in KL divergence up to order . The leading term scales with model complexity (roughly in well-specified exponential-family settings), while the second-order term incorporates higher-order cumulants, enabling a precise assessment of when the MLE is close to the closest model distribution. The authors propose the criterion to decide whether the current sample size and model dimension suffice, and provide an algorithm and practical calibration (via a Bayes-error bound) for exponential families. They illustrate the approach with real datasets (e.g., wine quality and abalone) to guide model compression and connect the framework to information criteria such as TIC/AIC, highlighting its use in model acceptance and complexity control.

Abstract

For a parametric model of distributions, the closest distribution in the model to the true distribution located outside the model is considered. Measuring the closeness between two distributions with the Kullback-Leibler (K-L) divergence, the closest distribution is called the "information projection." The estimation risk of the maximum likelihood estimator (MLE) is defined as the expectation of K-L divergence between the information projection and the predictive distribution with plugged-in MLE. Here, the asymptotic expansion of the risk is derived up to -order, and the sufficient condition on the risk for the Bayes error rate between the true distribution and the information projection to be lower than a specified value is investigated. Combining these results, the " criterion" is proposed, which determines whether the MLE is sufficiently close to the information projection for the given model and sample. In particular, the criterion for an exponential family model is relatively simple and can be used for a complex model with no explicit form of normalizing constant. This criterion can constitute a solution to the sample size or model acceptance problem. Use of the criteria is demonstrated for two practical datasets. The relationship between the results and information criteria is also studied.

Paper Structure

This paper contains 16 sections, 4 theorems, 188 equations, 1 table.

Key Result

Theorem 1

The MLE estimation risk with respect to K-L divergence is given by

Theorems & Definitions (7)

  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Theorem 2
  • proof
  • Corollary 2