The Pivotal Information Criterion

Sylvain Sardy; Maxime van Cutsem; Sara van de Geer

The Pivotal Information Criterion

Sylvain Sardy, Maxime van Cutsem, Sara van de Geer

TL;DR

PIC is defined as a continuous optimization problem, and the PIC penalty parameter $\lambda$ is selected at the detection boundary (under pure noise), and PIC's choice of $\lambda$ is the quantile of a statistic that the authors prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed.

Abstract

The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter $λ=\log n$ and $λ=2$ are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter $λ$ is selected at the detection boundary (under pure noise). PIC's choice of $λ$ is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.

The Pivotal Information Criterion

TL;DR

PIC is defined as a continuous optimization problem, and the PIC penalty parameter

is selected at the detection boundary (under pure noise), and PIC's choice of

is the quantile of a statistic that the authors prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed.

Abstract

and

are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter

is selected at the detection boundary (under pure noise). PIC's choice of

is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.

Paper Structure (19 sections, 8 theorems, 40 equations, 3 figures, 3 tables)

This paper contains 19 sections, 8 theorems, 40 equations, 3 figures, 3 tables.

Introduction
Detection boundary and phase transition
Information criteria
Our proposal and paper organization
The pivotal information criterion
Definition
Zero-thresholding function
Finding $(\phi,\, g)$
The pivotal detection boundary $\lambda_\alpha^{\rm PDB}$ in practice
Giving BIC a second chance
Simulation studies
Illustration of (non-)pivotal transformations
Empirical Phase Transition Analysis
Real Data Experiments
Conclusions and future work
...and 4 more sections

Key Result

Theorem 4

Let $\hat{\boldsymbol \beta}_\lambda$ minimize eq:IC0 and denote ${\boldsymbol \tau}=(\beta_0,\sigma)$ the nuisance parameters. Assume $L$ is twice differentiable at $(\mathbf 0,\hat{\boldsymbol \tau})$, where $\hat{\boldsymbol \tau}$ satisfies $\nabla_{{\boldsymbol \tau}} L (\mathbf 0,\hat{\boldsym satisfies that $( {\bf 0},\hat{\boldsymbol \tau})$ is a local minimizer of eq:IC0 if and only if $\

Figures (3)

Figure 1: Illustration of the detection boundary induced by PIC’s composite loss in the canonical Poisson model ($X = I_n$). Top row: simulated datasets with varying background intensities and sparsity levels $s \in \{0,3,3\}$. Middle row: componentwise magnitude of the pivotal gradient. Bottom row: corresponding analysis for the non pivotal gradient of the canonical GLM choice.
Figure 2: Phase transition behaviour in exact support recovery for the Gaussian linear model. Top row: results for the continuous complexity measure $C$. Bottom row: result for the discrete $C$ employing forward selection.
Figure 3: Phase transition behaviour in exact support recovery for the logistic (top) and Gumbel (bottom) regression model.

Theorems & Definitions (14)

Definition 1
Definition 2
Definition 3
Theorem 4: Zero-thresholding function
Proposition 5
Proposition 6
Theorem 7: Location–scale family
Theorem 8: One-parameter exponential family
Example 1: Bernoulli
Theorem 9: Weighted score loss
...and 4 more

The Pivotal Information Criterion

TL;DR

Abstract

The Pivotal Information Criterion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (14)