Table of Contents
Fetching ...

Estimation of the Learning Coefficient Using Empirical Loss

Tatsuyoshi Takio, Joe Suzuki

TL;DR

The paper addresses estimating the learning coefficient $\lambda$, a key parameter governing the asymptotic behavior of WAIC/WBIC in both regular and singular models. It introduces a novel estimator based on the Empirical Loss $T_n$ that combines WBIC with $T_n$ as $\hat{\lambda}_T=(WBIC_n-nT_n)/\log n$, eliminating the need for multiple inverse-temperature settings. The authors prove consistency and demonstrate through simulations that the estimator achieves lower bias and notably lower variance, especially in singular-model scenarios and when MCMC outliers occur. These results enhance the reliability and practicality of WBIC-based model-selection criteria in complex Bayesian settings and deepen understanding of singular model behavior.

Abstract

The learning coefficient plays a crucial role in analyzing the performance of information criteria, such as the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC), which Sumio Watanabe developed to assess model generalization ability. In regular statistical models, the learning coefficient is given by d/2, where d is the dimension of the parameter space. More generally, it is defined as the absolute value of the pole order of a zeta function derived from the Kullback-Leibler divergence and the prior distribution. However, except for specific cases such as reduced-rank regression, the learning coefficient cannot be derived in a closed form. Watanabe proposed a numerical method to estimate the learning coefficient, which Imai further refined to enhance its convergence properties. These methods utilize the asymptotic behavior of WBIC and have been shown to be statistically consistent as the sample size grows. In this paper, we propose a novel numerical estimation method that fundamentally differs from previous approaches and leverages a new quantity, "Empirical Loss," which was introduced by Watanabe. Through numerical experiments, we demonstrate that our proposed method exhibits both lower bias and lower variance compared to those of Watanabe and Imai. Additionally, we provide a theoretical analysis that elucidates why our method outperforms existing techniques and present empirical evidence that supports our findings.

Estimation of the Learning Coefficient Using Empirical Loss

TL;DR

The paper addresses estimating the learning coefficient , a key parameter governing the asymptotic behavior of WAIC/WBIC in both regular and singular models. It introduces a novel estimator based on the Empirical Loss that combines WBIC with as , eliminating the need for multiple inverse-temperature settings. The authors prove consistency and demonstrate through simulations that the estimator achieves lower bias and notably lower variance, especially in singular-model scenarios and when MCMC outliers occur. These results enhance the reliability and practicality of WBIC-based model-selection criteria in complex Bayesian settings and deepen understanding of singular model behavior.

Abstract

The learning coefficient plays a crucial role in analyzing the performance of information criteria, such as the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC), which Sumio Watanabe developed to assess model generalization ability. In regular statistical models, the learning coefficient is given by d/2, where d is the dimension of the parameter space. More generally, it is defined as the absolute value of the pole order of a zeta function derived from the Kullback-Leibler divergence and the prior distribution. However, except for specific cases such as reduced-rank regression, the learning coefficient cannot be derived in a closed form. Watanabe proposed a numerical method to estimate the learning coefficient, which Imai further refined to enhance its convergence properties. These methods utilize the asymptotic behavior of WBIC and have been shown to be statistically consistent as the sample size grows. In this paper, we propose a novel numerical estimation method that fundamentally differs from previous approaches and leverages a new quantity, "Empirical Loss," which was introduced by Watanabe. Through numerical experiments, we demonstrate that our proposed method exhibits both lower bias and lower variance compared to those of Watanabe and Imai. Additionally, we provide a theoretical analysis that elucidates why our method outperforms existing techniques and present empirical evidence that supports our findings.

Paper Structure

This paper contains 16 sections, 6 theorems, 48 equations, 6 figures, 3 tables.

Key Result

Proposition 1

Assume that the statistical model and the true distribution are in a regular relationship and realizable. Moreover, if $\varphi(\theta_*) > 0$, then holds watanabe2009algebraic.

Figures (6)

  • Figure 1:
  • Figure 2: Graph of learning coefficient estimates versus sample size (horizontal axis) for each method. Left: Example of a regular model using a normal distribution; Right: Example of a singular model using a two-component mixture Poisson distribution.
  • Figure 3: For Watanabe's method with $\beta_1 = 1/\log n$ and $\beta_2 > \beta_1$, the horizontal axis represents the gap $\beta_2-\beta_1$. The left panel shows the bias measured for each $\beta$ gap, while the right panel shows the corresponding variance.
  • Figure 4: Graph of the sum of log-likelihoods per iteration. Left: a typical iteration (e.g., iteration 183); Right: iteration 184.
  • Figure 5: Graph of the sum of log-likelihoods for each iteration. Left: the original graph; Right: the graph after 50 artificial outlier points (red dots) have been added (from iteration 4001 to 4050).
  • ...and 1 more figures

Theorems & Definitions (12)

  • Example 1
  • Proposition 1
  • Example 2
  • Example 3
  • Proposition 2
  • Proposition 3
  • proof
  • Proposition 4
  • Proposition 5
  • proof
  • ...and 2 more