Table of Contents
Fetching ...

Optimal Inference After Model Selection

William Fithian, Dennis Sun, Jonathan Taylor

TL;DR

This work formalizes inference after adaptive model selection by conditioning on the selection event and controlling the selective type I error, enabling valid long-run properties for post-selection hypotheses. It unifies selective inference with Lehmann–Scheffé optimality in exponential families and develops powerful selective tests and confidence intervals, including new selective z- and t-tests for linear regression and data-carving strategies that outperform data splitting. It provides computational tools (Monte Carlo methods and sampling) and extends the framework to non-Gaussian settings (clinical trials, Poisson scans, GLMs) with simulations demonstrating selection-inference tradeoffs. The discussion clarifies conceptual issues around randomness and interpretation, and highlights the scalability of conditioning-based inference to discipline-wide multiple-inference tasks.

Abstract

To perform inference after model selection, we propose controlling the selective type I error; i.e., the error rate of a test given that it was performed. By doing so, we recover long-run frequency properties among selected hypotheses analogous to those that apply in the classical (non-adaptive) context. Our proposal is closely related to data splitting and has a similar intuitive justification, but is more powerful. Exploiting the classical theory of Lehmann and Scheffé (1955), we derive most powerful unbiased selective tests and confidence intervals for inference in exponential family models after arbitrary selection procedures. For linear regression, we derive new selective z-tests that generalize recent proposals for inference after model selection and improve on their power, and new selective t-tests that do not require knowledge of the error variance.

Optimal Inference After Model Selection

TL;DR

This work formalizes inference after adaptive model selection by conditioning on the selection event and controlling the selective type I error, enabling valid long-run properties for post-selection hypotheses. It unifies selective inference with Lehmann–Scheffé optimality in exponential families and develops powerful selective tests and confidence intervals, including new selective z- and t-tests for linear regression and data-carving strategies that outperform data splitting. It provides computational tools (Monte Carlo methods and sampling) and extends the framework to non-Gaussian settings (clinical trials, Poisson scans, GLMs) with simulations demonstrating selection-inference tradeoffs. The discussion clarifies conceptual issues around randomness and interpretation, and highlights the scalability of conditioning-based inference to discipline-wide multiple-inference tasks.

Abstract

To perform inference after model selection, we propose controlling the selective type I error; i.e., the error rate of a test given that it was performed. By doing so, we recover long-run frequency properties among selected hypotheses analogous to those that apply in the classical (non-adaptive) context. Our proposal is closely related to data splitting and has a similar intuitive justification, but is more powerful. Exploiting the classical theory of Lehmann and Scheffé (1955), we derive most powerful unbiased selective tests and confidence intervals for inference in exponential family models after arbitrary selection procedures. For linear regression, we derive new selective z-tests that generalize recent proposals for inference after model selection and improve on their power, and new selective t-tests that do not require knowledge of the error variance.

Paper Structure

This paper contains 37 sections, 9 theorems, 112 equations, 7 figures, 2 tables.

Key Result

Proposition 1

Suppose there are $n$ independently operating research groups in a scientific discipline with a shared, countable question space $\mathcal{Q}$. Research group $i$ collects data $Y_{i}\sim F_{i}$, applies selection rule $\widehat{\mathcal{Q}}_{i}(Y_{i})\subseteq \mathcal{Q}$, and carries out selectiv Then as $n$ grows, the discipline as a whole achieves long-run control over the frequentist error r

Figures (7)

  • Figure 1: An example of the lasso with $n=2$ observations and $p=3$ variables.Tests are based on the distribution of $Y$, conditional on its landing in the highlighted region.
  • Figure 2: Instead of conditioning on the selection event $A_q$ that question $q$ is asked, we can condition on a finer event, the value of the random variable $S_q$. We call $S_q$ the selection variable.
  • Figure 3: Univariate Gaussian. $Y\sim N(\mu, 1)$ with selection event $A=\{Y>3\}$.
  • Figure 4: Contrast between data splitting and data carving in Example \ref{['ex:ssBad']}, in which $Y_{i}\sim N(\mu,1)$ independently for $i=1,2$. Data splitting discards $Y_{1}$ entirely, while data carving uses the leftover information in $Y_{1}$ for the second-stage inference. When $\mu\ll 3$, data carving also uses about one data point for inference since there is no information left over in $Y_{1}$. But when $\mu\gg 3$, conditioning barely effects the law of $Y_{1}$ and data carving has nearly two data points left over.
  • Figure 5: Contrast between the saturated-model and selected-model tests in Example \ref{['ex:bivar']}, in which we fit a one-sparse model with design matrix $X=I_2$. The selected-model test is based on $\mathcal{L}_0(Y_1 \,|\, A)$, whereas the saturated-model test is based on $\mathcal{L}_0(Y_1 \,|\, Y_2, A)$.
  • ...and 2 more figures

Theorems & Definitions (18)

  • Example 1: File Drawer Effect
  • Proposition 1: Discipline-Wide Error Control
  • Proposition 2: Duality of Selective Tests and Confidence Sets
  • proof
  • Proposition 3: Monotonicity of Selective Error
  • proof
  • Example 2
  • Theorem 5: lehmann1955completeness
  • Corollary 6: UMPU Selective Tests
  • Theorem 7: Matthes and Truax, Theorem 3.1
  • ...and 8 more