An Asymptotic Equation Linking WAIC and WBIC in Singular Models
Naoki Hayashi, Takuro Kutsuna, Sawa Takamuku
TL;DR
This paper tackles model selection in singular statistical models where traditional criteria fail by deriving an asymptotic equation that links WAIC and WBIC, two criteria built on posteriors at different inverse temperatures. Leveraging singular learning theory, it expresses WAIC in terms of the WBIC posterior using the real log canonical threshold $\lambda$ and the singular fluctuation $\nu$, and shows that WAIC can be approximated from the WBIC posterior with $\beta=1/\log n$. The main result provides an explicit relationship between $n\mathbb{E}[\hat{G}_n(\beta)]$ and $\mathbb{E}[\hat{F}_n]$, plus correction terms, enabling asymptotically unbiased estimation of WAIC from WBIC data, and introduces estimators for $\lambda$ (Imai estimator) within this framework. Practically, this establishes a path to reduce sampling costs for joint WAIC/WBIC usage in singular models, while deepening the theoretical understanding of their asymptotic behavior under RLCT-driven dynamics.
Abstract
In statistical learning, models are classified as regular or singular depending on whether the mapping from parameters to probability distributions is injective. Most models with hierarchical structures or latent variables are singular, for which conventional criteria such as the Akaike Information Criterion and the Bayesian Information Criterion are inapplicable due to the breakdown of normal approximations for the likelihood and posterior. To address this, the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC) have been proposed. Since WAIC and WBIC are computed using posterior distributions at different temperature settings, separate posterior sampling is generally required. In this paper, we theoretically derive an asymptotic equation that links WAIC and WBIC, despite their dependence on different posteriors. This equation yields an asymptotically unbiased expression of WAIC in terms of the posterior distribution used for WBIC. The result clarifies the structural relationship between these criteria within the framework of singular learning theory, and deepens understanding of their asymptotic behavior. This theoretical contribution provides a foundation for future developments in the computational efficiency of model selection in singular models.
