Table of Contents
Fetching ...

Foundation of Calculating Normalized Maximum Likelihood for Continuous Probability Models

Atsushi Suzuki, Kota Fukuzawa, Kenji Yamanishi

TL;DR

This work resolves a long-standing gap by proving that the normalized maximum likelihood (NML) model complexity for continuous probabilistic models can be computed via the same estimator-based integration method previously used in discrete settings. The authors introduce a novel decomposition rooted in the coarea formula from geometric measure theory, replacing ill-suited Lebesgue-based fiber integrals with Hausdorff-measure-based integrals and a non-square Jacobian to account for dimension mismatch between data and parameter spaces. They derive a precise estimator-PDF representation $p[oldsymbol{ heta}_{ lat} mu_{ heta}]$ and prove that the parametric complexity $ ext{Comp}_{v}( mu_{ Theta})$ equals $ int p[oldsymbol{ heta}_{ lat} mu_{ heta}](oldsymbol{ heta}) v(oldsymbol{ heta}) doldsymbol{L}^{K}(oldsymbol{ heta})$, validating the continuous-case MC calculation. A concrete example with the exponential model illustrates the practical computation of the MC, and the results generalize prior continuous MDL approaches by providing a complete, rigorous proof. The findings have significant implications for model selection in continuous settings, underpinning the reliability of MDL-based criteria in a broad class of problems.

Abstract

The normalized maximum likelihood (NML) code length is widely used as a model selection criterion based on the minimum description length principle, where the model with the shortest NML code length is selected. A common method to calculate the NML code length is to use the sum (for a discrete model) or integral (for a continuous model) of a function defined by the distribution of the maximum likelihood estimator. While this method has been proven to correctly calculate the NML code length of discrete models, no proof has been provided for continuous cases. Consequently, it has remained unclear whether the method can accurately calculate the NML code length of continuous models. In this paper, we solve this problem affirmatively, proving that the method is also correct for continuous cases. Remarkably, completing the proof for continuous cases is non-trivial in that it cannot be achieved by merely replacing the sums in discrete cases with integrals, as the decomposition trick applied to sums in the discrete model case proof is not applicable to integrals in the continuous model case proof. To overcome this, we introduce a novel decomposition approach based on the coarea formula from geometric measure theory, which is essential to establishing our proof for continuous cases.

Foundation of Calculating Normalized Maximum Likelihood for Continuous Probability Models

TL;DR

This work resolves a long-standing gap by proving that the normalized maximum likelihood (NML) model complexity for continuous probabilistic models can be computed via the same estimator-based integration method previously used in discrete settings. The authors introduce a novel decomposition rooted in the coarea formula from geometric measure theory, replacing ill-suited Lebesgue-based fiber integrals with Hausdorff-measure-based integrals and a non-square Jacobian to account for dimension mismatch between data and parameter spaces. They derive a precise estimator-PDF representation and prove that the parametric complexity equals , validating the continuous-case MC calculation. A concrete example with the exponential model illustrates the practical computation of the MC, and the results generalize prior continuous MDL approaches by providing a complete, rigorous proof. The findings have significant implications for model selection in continuous settings, underpinning the reliability of MDL-based criteria in a broad class of problems.

Abstract

The normalized maximum likelihood (NML) code length is widely used as a model selection criterion based on the minimum description length principle, where the model with the shortest NML code length is selected. A common method to calculate the NML code length is to use the sum (for a discrete model) or integral (for a continuous model) of a function defined by the distribution of the maximum likelihood estimator. While this method has been proven to correctly calculate the NML code length of discrete models, no proof has been provided for continuous cases. Consequently, it has remained unclear whether the method can accurately calculate the NML code length of continuous models. In this paper, we solve this problem affirmatively, proving that the method is also correct for continuous cases. Remarkably, completing the proof for continuous cases is non-trivial in that it cannot be achieved by merely replacing the sums in discrete cases with integrals, as the decomposition trick applied to sums in the discrete model case proof is not applicable to integrals in the continuous model case proof. To overcome this, we introduce a novel decomposition approach based on the coarea formula from geometric measure theory, which is essential to establishing our proof for continuous cases.
Paper Structure (16 sections, 12 theorems, 75 equations, 1 figure)

This paper contains 16 sections, 12 theorems, 75 equations, 1 figure.

Key Result

Proposition 1

Suppose that $\lambda$ is a strictly positive measure, that is, $\lambda(A)>0$ holds for any open set $A$. Then, any probability measure $\mu$ has a unique continuous PDF if it exists.

Figures (1)

  • Figure 1: Example of the coarea formula by an estimator map. In this example, $\mathcal{X} = \mathbb{R}^{2}$, $\boldsymbol{\hat{\mathrm{\uptheta}}} \ab (\boldsymbol{x}) = x_{1}^{2} + 4 x_{2}^{2}$. For $\int_{A} \mathrm{h} (\boldsymbol{x}) \odif{\mathcal{L}^{D} \ab (\boldsymbol{x})} = \int_{\mathbb{R}^{K}} \mathrm{g} (\boldsymbol{\theta}) \odif{\mathcal{L}^{K} \ab (\boldsymbol{\theta})}$ to hold, the integral of $\mathrm{h}$ on the gray area, which is bounded by $S_{\theta} = \hat{\mathrm{\uptheta}}^{-1} \ab (\ab \{\theta\})$ and $S_{\theta + \Delta \theta} = \hat{\mathrm{\uptheta}}^{-1} \ab (\ab \{\theta + \Delta \theta\})$ should be approximated by $\mathrm{g} (\boldsymbol{\theta}) \Delta \theta$. Here, the interval between $S_{\theta} = \hat{\mathrm{\uptheta}}^{-1} \ab (\ab \{\theta\})$ and $S_{\theta + \Delta \theta} = \hat{\mathrm{\uptheta}}^{-1} \ab (\ab \{\theta + \Delta \theta\})$ is not uniform everywhere but approximated by $\ab (J \hat{\mathrm{\uptheta}} \ab (\boldsymbol{x}))^{-1}$. Hence, $\mathrm{g}$ should be given by integrating the product of $\mathrm{h} \ab (\boldsymbol{x})$ and $\ab (J \hat{\mathrm{\uptheta}} \ab (\boldsymbol{x}))^{-1}$ on the ellipse $S_{\theta}$. Here, since we consider the integral on an ellipse, which is locally one-dimensional, the integral defined on the two-dimensional Lebesgue measure does not work. Moreover, since an ellipse is not a line, we cannot apply the integral defined on the one-dimensional Lebesgue measure. Instead, we need to apply the integral defined on the one-dimensional Hausdorff measure.

Theorems & Definitions (46)

  • Remark 1
  • Definition 2: discrete PPM and probability mass function
  • Definition 3: continuous PPM and probability density function
  • Remark 4
  • Proposition 1
  • Definition 5: normalized maximum likelihood (NML) and model complexity (MC)
  • Remark 6
  • Definition 7: maximum likelihood estimator (MLE)
  • Proposition 2
  • Definition 8: pushforward measure
  • ...and 36 more