Table of Contents
Fetching ...

The Pivotal Information Criterion

Sylvain Sardy, Maxime van Cutsem, Sara van de Geer

TL;DR

PIC is defined as a continuous optimization problem, and the PIC penalty parameter $\lambda$ is selected at the detection boundary (under pure noise), and PIC's choice of $\lambda$ is the quantile of a statistic that the authors prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed.

Abstract

The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter $λ=\log n$ and $λ=2$ are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter $λ$ is selected at the detection boundary (under pure noise). PIC's choice of $λ$ is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.

The Pivotal Information Criterion

TL;DR

PIC is defined as a continuous optimization problem, and the PIC penalty parameter is selected at the detection boundary (under pure noise), and PIC's choice of is the quantile of a statistic that the authors prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed.

Abstract

The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter and are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter is selected at the detection boundary (under pure noise). PIC's choice of is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.
Paper Structure (19 sections, 8 theorems, 40 equations, 3 figures, 3 tables)

This paper contains 19 sections, 8 theorems, 40 equations, 3 figures, 3 tables.

Key Result

Theorem 4

Let $\hat{\boldsymbol \beta}_\lambda$ minimize eq:IC0 and denote ${\boldsymbol \tau}=(\beta_0,\sigma)$ the nuisance parameters. Assume $L$ is twice differentiable at $(\mathbf 0,\hat{\boldsymbol \tau})$, where $\hat{\boldsymbol \tau}$ satisfies $\nabla_{{\boldsymbol \tau}} L (\mathbf 0,\hat{\boldsym satisfies that $( {\bf 0},\hat{\boldsymbol \tau})$ is a local minimizer of eq:IC0 if and only if $\

Figures (3)

  • Figure 1: Illustration of the detection boundary induced by PIC’s composite loss in the canonical Poisson model ($X = I_n$). Top row: simulated datasets with varying background intensities and sparsity levels $s \in \{0,3,3\}$. Middle row: componentwise magnitude of the pivotal gradient. Bottom row: corresponding analysis for the non pivotal gradient of the canonical GLM choice.
  • Figure 2: Phase transition behaviour in exact support recovery for the Gaussian linear model. Top row: results for the continuous complexity measure $C$. Bottom row: result for the discrete $C$ employing forward selection.
  • Figure 3: Phase transition behaviour in exact support recovery for the logistic (top) and Gumbel (bottom) regression model.

Theorems & Definitions (14)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 4: Zero-thresholding function
  • Proposition 5
  • Proposition 6
  • Theorem 7: Location–scale family
  • Theorem 8: One-parameter exponential family
  • Example 1: Bernoulli
  • Theorem 9: Weighted score loss
  • ...and 4 more