Connecting model-based and model-free approaches to linear least squares regression
Lutz Duembgen, Laurie Davies
TL;DR
This paper investigates how p-values from model-based linear regression analyses have precise model-free interpretations, framing the problem in terms of orthogonal invariance and Haar measure. It introduces equivalence regions that reinterpret confidence regions in a model-free context and derives concrete forms under Gaussian and Gaussian-sequence models, including connections to Beta and F distributions. The results cover both standard and composite null models and extend to sparse-signal inference, providing exact finite-sample-like guarantees and guidance for permutation-based and high-dimensional variable selection methods. Overall, the work bridges classical likelihood-based inference with data-centric, model-agnostic interpretations, offering rigorous tools for assessing regressor relevance in both low- and high-dimensional settings.
Abstract
In a regression setting with a response vector and given regressor vectors, a typical question is to what extent the response is related to these regressors, specifically, how well it can be approximated by a linear combination of the latter. Classical methods for this question are based on statistical models for the conditional distribution of the response, given the regressors. In the present paper it is shown that various p-values resulting from this model-based approach have also a purely data-analytic, model-free interpretation. This finding is derived in a rather general context. In addition, we introduce equivalence regions, a reinterpretation of confidence regions in the model-free context.
