Table of Contents
Fetching ...

Valid F-screening in linear regression

Olivia McGough, Daniela Witten, Daniel Kessler

TL;DR

The paper tackles invalid inference after F-screening in linear regression by introducing a conditional selective inference framework that accounts for the initial omnibus test $H_0^{1:p}: \beta_1=\cdots=\beta_p=0$ being rejected. It develops selective p-values $p_{H_0^M\mid E}$ that control the selective Type I error and can be computed from standard regression outputs, including a debiased variance estimator $\tilde{\sigma}^2$ for unknown variance. The authors quantify leftover Fisher information after selection, compare to sample splitting, and demonstrate higher information and power in the selective approach, with extensive simulations and real-data reanalyses (prospective and retrospective). They also provide a practical retrospective analysis pathway using only summary statistics and discuss specialized cases and geometry, concluding with limitations and future extensions. Overall, the framework enables valid, end-to-end selective inference for regression coefficients in the common F-screening scenario, including retrospective corrections for published findings.

Abstract

Suppose that a data analyst wishes to report the results of a least squares linear regression only if the overall null hypothesis, $H_0^{1:p}: β_1= β_2 = \ldots = β_p=0$, is rejected. This practice, which we refer to as F-screening (since the overall null hypothesis is typically tested using an $F$-statistic), is in fact common practice across a number of applied fields. Unfortunately, it poses a problem: standard guarantees for the inferential outputs of linear regression, such as Type 1 error control of hypothesis tests and nominal coverage of confidence intervals, hold unconditionally, but fail to hold conditional on rejection of the overall null hypothesis. In this paper, we develop an inferential toolbox for the coefficients in a least squares model that are valid conditional on rejection of the overall null hypothesis. We develop selective p-values that lead to tests that are consistent and control the selective Type 1 error, i.e., the Type 1 error conditional on having rejected the overall null hypothesis. Furthermore, they can be computed without access to the raw data, i.e., using only the standard outputs of a least squares linear regression, and therefore are suitable for use in a retrospective analysis of a published study. We also develop confidence intervals that attain nominal selective coverage, and point estimates that account for having rejected the overall null hypothesis. We derive an expression for the Fisher information about the coefficients resulting from the proposed approach, and compare this to the Fisher information that results from an alternative approach that relies on sample splitting. We investigate the proposed approach in simulation and via re-analysis of two datasets from the biomedical literature.

Valid F-screening in linear regression

TL;DR

The paper tackles invalid inference after F-screening in linear regression by introducing a conditional selective inference framework that accounts for the initial omnibus test being rejected. It develops selective p-values that control the selective Type I error and can be computed from standard regression outputs, including a debiased variance estimator for unknown variance. The authors quantify leftover Fisher information after selection, compare to sample splitting, and demonstrate higher information and power in the selective approach, with extensive simulations and real-data reanalyses (prospective and retrospective). They also provide a practical retrospective analysis pathway using only summary statistics and discuss specialized cases and geometry, concluding with limitations and future extensions. Overall, the framework enables valid, end-to-end selective inference for regression coefficients in the common F-screening scenario, including retrospective corrections for published findings.

Abstract

Suppose that a data analyst wishes to report the results of a least squares linear regression only if the overall null hypothesis, , is rejected. This practice, which we refer to as F-screening (since the overall null hypothesis is typically tested using an -statistic), is in fact common practice across a number of applied fields. Unfortunately, it poses a problem: standard guarantees for the inferential outputs of linear regression, such as Type 1 error control of hypothesis tests and nominal coverage of confidence intervals, hold unconditionally, but fail to hold conditional on rejection of the overall null hypothesis. In this paper, we develop an inferential toolbox for the coefficients in a least squares model that are valid conditional on rejection of the overall null hypothesis. We develop selective p-values that lead to tests that are consistent and control the selective Type 1 error, i.e., the Type 1 error conditional on having rejected the overall null hypothesis. Furthermore, they can be computed without access to the raw data, i.e., using only the standard outputs of a least squares linear regression, and therefore are suitable for use in a retrospective analysis of a published study. We also develop confidence intervals that attain nominal selective coverage, and point estimates that account for having rejected the overall null hypothesis. We derive an expression for the Fisher information about the coefficients resulting from the proposed approach, and compare this to the Fisher information that results from an alternative approach that relies on sample splitting. We investigate the proposed approach in simulation and via re-analysis of two datasets from the biomedical literature.

Paper Structure

This paper contains 35 sections, 14 theorems, 122 equations, 10 figures, 2 tables.

Key Result

Theorem 2.1

A test of $H_0^M$ based on the p-value ${p_{H_0^M \mid E}}(y)$eq:pselective controls the selective Type 1 error, conditional on the event that $H_0^{1:p}$ was rejected. That is, for any $\alpha' \in (0,1)$, $\Pr_{\beta_M=0}\left( {p_{H_0^M \mid E}}(Y) \leq \alpha' \mid Y\in E_1 \right) = \alpha'.$

Figures (10)

  • Figure 1: A subset of Table 1 from keil2023LongitudinalSleepPatterns. The $n=826$ participants in the dataset are split into three age groups: under 65 years old, between 65 and 85 years old, and over 85 years old. Each row in the table corresponds to a continuous variable of interest, with means and standard deviations reported for each age group. For each of these continuous variables, is conducted to test for a difference in means between the three age groups (corresponding to Step 1 of Box \ref{['box:box1']}). The associated p-value is reported in the "P Value" column. The last column holds p-values for "post hoc tests" which are typically conducted only if the p-value associated with the test is small (corresponding to Step 2 of Box \ref{['box:box1']}). In this case, the p-values in the last column would be invalid since they do not account for the fact that they were only computed because $H_0^{1:p}$\ref{['eq:hov']} was rejected.
  • Figure 2: We consider testing $H_0^{1:2}:\beta_1=\beta_2=0$ using $F_{H_0^{1:2}}$\ref{['eq:Foverall']} and $H_0^1:\beta_1=0$ using $F_{H_0^1}$\ref{['eq:fstat']} in the model $Y=X\beta+\epsilon$, where $X = 100100$. The red double cone corresponds to the rejection boundary of the test $H_0^{1:2}$, and the blue planes correspond to the rejection boundary of $H_0^1$. Thus data points that lie farther from the $Y_3$-axis than the double cone lead to rejection of $H_0^{1:2}$, and data points that lie farther from the $(Y_2,Y_3)$-plane than the two blue planes lead to rejection of $H_0^1$. See \ref{['app:geometry']} for mathematical details.
  • Figure 3: We consider testing $H_0^{1:2}:\beta_1=\beta_2=0$ and $H_0^1:\beta_1=0$ in the model $Y=X\beta+\epsilon$ where $X = 1001$ and we take the variance of $\epsilon$ to be known. In the left plot we use $\chi^2$-tests to test $H_0^{1:2}$ and $H_0^1$; this is in analogy to F-screening in Box \ref{['box:box1']} where Step 2 is conducted using a "standard" test. On the right, we use a $\chi^2$-test to test $H_0^{1:2}$, and then we use a test of $H_0^1$ in Step 2 that accounts for the rejection of $H_0^{1:2}$ in Step 1. This leads to a substantial change in the geometry of the rejection region of $H_0^1$. Mathematical details are in \ref{['app:geometry']}.
  • Figure 4: We construct an $n \times p$ orthogonal design matrix $X$ with $n=100$ and $p=5$. Then, for $\beta_1\in \{0.5,0.25,0\}$ and for a range of values of $s\in [-1,1]$, we generate 1000 response vectors $\tilde{Y}$ according to \ref{['eq:model']} with $\sigma = 1$, $\beta_{j} = s$ for all $j\in \{2, \ldots, p \}$, and $\beta_0=0$. We then compute quantities related to the leftover Fisher information about $\beta_1$ for our selective approach, and for sample splitting with split proportions $\rho\in\{0.1,0.5,0.9\}$. Left: For each of the 1000 realizations $\tilde{y}$ of $\tilde{Y}$, we compute the leftover Fisher information $\mathcal{I}_{Y\mid Y \in R(\tilde{y}; X)}(\beta_1;Y \in R(\tilde{y}; X))$ for our conditional selective inference procedure and display the average over all datasets such that $\tilde{Y}\in R_1(X)$. This approximates ${\mathbb{E}}\left[\mathcal{I}_{Y\mid Y \in R(\tilde{Y}; X)}(\beta_1;Y \in R(\tilde{Y}; X))\mid \tilde{Y}\in R_1(X)\right]$. Further, for $\rho\in \{0.1,0.5,0.9\}$, we display $(X^{\mathrm{te}})_1^\top (I - P_{(X^{\mathrm{te}})_{-1}})(X^{\mathrm{te}})_1/\sigma^2$. Center: We display $\Pr(Y \in R_1(X))$, along with $\Pr(Y^{\mathrm{tr}}_\rho\in R_\rho(X^{\mathrm{tr}}))$ for $\rho \in \{0.1,0.5,0.9\}$. Right: We display the product of the left-hand and center columns, i.e., the right-hand sides of \ref{['eq:info_exp_sel']} and \ref{['eq:info_exp_split']}.
  • Figure 5: Left: For the simulation study detailed in Section \ref{['subsec:t1control']}, we display a $\operatorname{Unif}(0,1)$ quantile-quantile plot of p-values for the test of $H_0^1:\beta_1 = 0$, conditional on the event that $H_0^{1:p}$\ref{['eq:hov']} is rejected. Here, $H_0^{1:p}$ (and hence $H_0^1$) holds, and so a p-value for $H_0^1:\beta_1=0$ that controls the selective Type 1 error should follow a $\operatorname{Unif}(0,1)$ distribution. If the trace lies below the diagonal, the p-value is anti-conservative. For the subset of simulated datasets for which $H_0^{1:p}$ is rejected using ${F_{H_0^{1:p}}}$, we display $p_{H_0^M}$\ref{['eq:pnaive']}, ${p_{H_0^M \mid E}}$\ref{['eq:pselchisq']}, ${p^{\tilde{\sigma}^2}_{H_0^M \mid E}}$\ref{['eq:pseltilde']}, and ${p^\mathrm{\hat{\sigma}^2}_{H_0^M \mid E}}$\ref{['eq:pselhat']}. Middle: For the simulation setting described in \ref{['subsec:power']}, we plot the power to reject $H_0^{1:p}$ at three different significance levels ($\alpha_0 \in \left\{ 0.05, 0.1, 0.5 \right\}$) with (i) the $F$-statistic based on all of the data, $F_{H_0^{1:p}}$\ref{['eq:Foverall']}, and (ii) the $F$-statistic based on only the training set from sample splitting, $F_{H_0^{1:p}}^{\mathrm{train}}$. Right: For the simulation setting described in \ref{['subsec:power']}, we plot the conditional probability of rejecting $H_0^1$ at significance level 0.05 (marked with a gray dashed line) given that $H_0^{1:p}$ was rejected at three levels of significance ($\alpha_0\in \{0.05,0.1, 0.5\})$. When we reject $H_0^{1:p}$ with ${F_{H_0^{1:p}}}$, we compute the p-values (i) ${p_{H_0^M \mid E}}$\ref{['eq:pselchisq']}, and (ii) ${p^{\tilde{\sigma}^2}_{H_0^M \mid E}}$\ref{['eq:pseltilde']}. When we reject $H_0^{1:p}$ with $F_{H_0^{1:p}}^{\mathrm{train}},$ we compute $p_{H_0^M}^{\text{test}}$.
  • ...and 5 more figures

Theorems & Definitions (34)

  • Theorem 2.1
  • Proposition 2.1
  • Proposition 2.2
  • Remark 1
  • Definition 1: adapted from fithian2017OptimalInferenceModel
  • Proposition 2.3
  • Remark 2
  • Proposition 2.4
  • Proposition 2.5
  • Proposition 2.6
  • ...and 24 more