Learning under Singularity: An Information Criterion improving WBIC and sBIC
Lirui Liu, Joe Suzuki
TL;DR
The paper tackles model selection for singular statistical models, where traditional criteria like AIC/BIC fail. It introduces Learning under Singularity (LS), an information criterion that combines an empirical WAIC-like predictive loss with a log-sample penalty, and provides a method to estimate the learning coefficient $\lambda$ when unknown. Theoretical results show LS achieves asymptotic performance comparable to WBIC when $\lambda$ is known, while offering greater stability and independence from the auxiliary inverse temperature $\beta_0$; it also remains applicable when maximum-likelihood estimation is problematic. Empirical evaluations on reduced-rank regression and Gaussian mixtures demonstrate LS's robustness to large sample sizes, unknown learning coefficients, and nonregular settings, outperforming WBIC in scenarios where $\beta_0$ choices destabilize WBIC and where sBIC depends on MLE existence.
Abstract
We introduce a novel Information Criterion (IC), termed Learning under Singularity (LS), designed to enhance the functionality of the Widely Applicable Bayes Information Criterion (WBIC) and the Singular Bayesian Information Criterion (sBIC). LS is effective without regularity constraints and demonstrates stability. Watanabe defined a statistical model or a learning machine as regular if the mapping from a parameter to a probability distribution is one-to-one and its Fisher information matrix is positive definite. In contrast, models not meeting these conditions are termed singular. Over the past decade, several information criteria for singular cases have been proposed, including WBIC and sBIC. WBIC is applicable in non-regular scenarios but faces challenges with large sample sizes and redundant estimation of known learning coefficients. Conversely, sBIC is limited in its broader application due to its dependence on maximum likelihood estimates. LS addresses these limitations by enhancing the utility of both WBIC and sBIC. It incorporates the empirical loss from the Widely Applicable Information Criterion (WAIC) to represent the goodness of fit to the statistical model, along with a penalty term similar to that of sBIC. This approach offers a flexible and robust method for model selection, free from regularity constraints.
