Learning under Singularity: An Information Criterion improving WBIC and sBIC

Lirui Liu; Joe Suzuki

Learning under Singularity: An Information Criterion improving WBIC and sBIC

Lirui Liu, Joe Suzuki

TL;DR

The paper tackles model selection for singular statistical models, where traditional criteria like AIC/BIC fail. It introduces Learning under Singularity (LS), an information criterion that combines an empirical WAIC-like predictive loss with a log-sample penalty, and provides a method to estimate the learning coefficient $\lambda$ when unknown. Theoretical results show LS achieves asymptotic performance comparable to WBIC when $\lambda$ is known, while offering greater stability and independence from the auxiliary inverse temperature $\beta_0$; it also remains applicable when maximum-likelihood estimation is problematic. Empirical evaluations on reduced-rank regression and Gaussian mixtures demonstrate LS's robustness to large sample sizes, unknown learning coefficients, and nonregular settings, outperforming WBIC in scenarios where $\beta_0$ choices destabilize WBIC and where sBIC depends on MLE existence.

Abstract

We introduce a novel Information Criterion (IC), termed Learning under Singularity (LS), designed to enhance the functionality of the Widely Applicable Bayes Information Criterion (WBIC) and the Singular Bayesian Information Criterion (sBIC). LS is effective without regularity constraints and demonstrates stability. Watanabe defined a statistical model or a learning machine as regular if the mapping from a parameter to a probability distribution is one-to-one and its Fisher information matrix is positive definite. In contrast, models not meeting these conditions are termed singular. Over the past decade, several information criteria for singular cases have been proposed, including WBIC and sBIC. WBIC is applicable in non-regular scenarios but faces challenges with large sample sizes and redundant estimation of known learning coefficients. Conversely, sBIC is limited in its broader application due to its dependence on maximum likelihood estimates. LS addresses these limitations by enhancing the utility of both WBIC and sBIC. It incorporates the empirical loss from the Widely Applicable Information Criterion (WAIC) to represent the goodness of fit to the statistical model, along with a penalty term similar to that of sBIC. This approach offers a flexible and robust method for model selection, free from regularity constraints.

Learning under Singularity: An Information Criterion improving WBIC and sBIC

TL;DR

when unknown. Theoretical results show LS achieves asymptotic performance comparable to WBIC when

is known, while offering greater stability and independence from the auxiliary inverse temperature

; it also remains applicable when maximum-likelihood estimation is problematic. Empirical evaluations on reduced-rank regression and Gaussian mixtures demonstrate LS's robustness to large sample sizes, unknown learning coefficients, and nonregular settings, outperforming WBIC in scenarios where

choices destabilize WBIC and where sBIC depends on MLE existence.

Abstract

Paper Structure (13 sections, 1 theorem, 25 equations, 4 tables)

This paper contains 13 sections, 1 theorem, 25 equations, 4 tables.

Introduction
Preliminaries
Regularity
Learning Coefficient
sBIC and WBIC
Proposed Information Criterion: Learning under Singularity (LS)
Experiments
Application to Reduced-Rank Regression Models
Application to Gaussian Mixture Models
Conclusion
Stan code
Stan code for Application to Reduced-Rank Regression Models
Stan code for Application to Gaussian Mixture Models

Key Result

Theorem 1

Theorems & Definitions (2)

Example 1
Theorem 1

Learning under Singularity: An Information Criterion improving WBIC and sBIC

TL;DR

Abstract

Learning under Singularity: An Information Criterion improving WBIC and sBIC

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (2)