Revisiting inference after prediction
Keshav Motwani, Daniela Witten
TL;DR
Prediction-based inference uses $(\hat{f}(Z), X)$ to assess the $Y$–$X$ association when $Y$ is costly to observe. The paper compares Wang et al. (2020) and Angelopoulos et al. (2023) proposals and shows that Angelopoulos targets the correct parameter $\beta^* = E[XX^\top]^{-1}E[XY]$, while Wang's corrections target a different quantity and can fail to control Type I error or achieve nominal coverage for general $\hat{f}$. Through simulations and a direct replication of Wang 2020's study, the authors demonstrate that Wang's corrections are often anticonservative and give poor coverage unless $\hat{f}$ is effectively perfect, whereas Angelopoulos's debiasing approach yields valid inference regardless of $\hat{f}$ quality; under the extreme but unrealistic case $\hat{f} = E[Y|Z]$, all methods align. The extreme analysis underscores that alignment with $\beta^*$ can occur only under strong assumptions, while the results generalize beyond linear models. Overall, the work advocates using debiasing-based prediction inference for reliable semi-supervised inference across diverse prediction settings and provides code to reproduce the results.
Abstract
Recent work has focused on the very common practice of prediction-based inference: that is, (i) using a pre-trained machine learning model to predict an unobserved response variable, and then (ii) conducting inference on the association between that predicted response and some covariates. As pointed out by Wang et al. (2020), applying a standard inferential approach in (ii) does not accurately quantify the association between the unobserved (as opposed to the predicted) response and the covariates. In recent work, Wang et al. (2020) and Angelopoulos et al. (2023) propose corrections to step (ii) in order to enable valid inference on the association between the unobserved response and the covariates. Here, we show that the method proposed by Angelopoulos et al. (2023) successfully controls the type 1 error rate and provides confidence intervals with correct nominal coverage, regardless of the quality of the pre-trained machine learning model used to predict the unobserved response. However, the method proposed by Wang et al. (2020) provides valid inference only under very strong conditions that rarely hold in practice: for instance, if the machine learning model perfectly estimates the true regression function in the study population of interest.
