Finite- and Large- Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples

Peng Wang; Min-Ge Xie; Linjun Zhang

Finite- and Large- Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples

Peng Wang, Min-Ge Xie, Linjun Zhang

TL;DR

This work addresses inference in high-dimensional linear regression where model selection uncertainty is central. It introduces a repro samples framework that uses Fisher inversion to construct a compact set of candidate models likely to contain the true sparse model and Fisher-Dempster inversion to form level-α model confidence sets, with finite-sample guarantees and large-sample extensions. The approach provides confidence sets for both the true model and regression coefficients (including subsets and transformations) without relying on covariance estimation or consistent model selection, and it demonstrates superior finite-sample performance and tighter intervals compared with debiased Lasso and bootstrap in simulations and real-data analysis. The methods are robust to a range of error distributions (Gaussian and non-Gaussian) and offer a practical, computationally efficient toolkit for model-uncertainty aware inference in ultra-high dimensional settings.

Abstract

In this paper, we present a novel and effective inference approach to conduct both finite- and large-sample inference for high-dimensional linear regression models. This approach is developed under the so-called repro samples framework, in which we conduct statistical inference by creating and studying the behavior of artificial samples that are obtained by mimicking the sampling mechanism of the data. We construct confidence sets for (a) the true model corresponding to the nonzero coefficients, (b) a single or any collection of regression coefficients, and (c) both the model and regression coefficients jointly. To facilitate the constructions of these confidence sets and overcome computational difficulties of searching all possible models, we use an innovative Fisher inversion technique to construct a model candidate set that includes the true sparse model with the probability close to 1 for models with both Gaussian and non-Gaussian errors. The proposed approach fills in two major gaps in the high-dimensional regression literature: (1) lack of effective approaches to addressing model selection uncertainty and providing valid inference for the underlying true model; (2) lack of effective inference approaches to guaranteeing finite-sample performance. We provide both finite-sample and asymptotic results to theoretically guarantee the performance of the proposed methods. In addition, our numerical results demonstrate that the proposed methods are valid and achieve better coverage with smaller confidence sets than the current state-of-the-art approaches, such as debiasing and bootstrap approaches.

Finite- and Large- Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples

TL;DR

Abstract

Paper Structure (46 sections, 30 theorems, 220 equations, 1 figure, 10 tables, 2 algorithms)

This paper contains 46 sections, 30 theorems, 220 equations, 1 figure, 10 tables, 2 algorithms.

Introduction
Contributions
Related works
Notation
Organization
Finding candidate models for $\tau_0$
Identifiability and definition of $\tau_0$
Algorithm for finding candidate models
Theoretical results for models with Gaussian errors
Heterogeneous, non-Gaussian and sub-Gaussian error models
Construction of a level-$\alpha$ Model Confidence Set
Inference for regression coefficients accounting for model selection uncertainty
Inference for a subset of regression coefficients
Two special cases of interest
Extension to models with non-Gaussian errors
...and 31 more sections

Key Result

Lemma 1

Let $\mathbf{H}_\tau$ be the projection matrix of $\mathbf{X}_\tau$ and $\mathbf{H}_{\tau,\mathbf{u}^{rel}}$ be the projection matrix of $(\mathbf{X}_\tau, \mathbf{u}^{rel}).$ Let $\gamma^2_{(\mathbf{u}^{rel}, \tau_0)} = 1- \min_{\{\tau: |\tau| < |\tau_0|\}}\frac{\|(I - \mathbf{H}_{\tau, \mathbf{u} and moreover $(\tau_0,\bm\beta_0, \sigma_0) = \underset{\tau, \bm\beta_{\tau}, \sigma}{\rm argm

Figures (1)

Figure 1: (a) Confidence curve xie_confidence_2013 plot on $S^{(d)}$; (b) confidence sets of $\bm\beta_\tau$ (one 3-dimensional ellipsoid and two 2-dimensional ellipsoids) of the three $\tau$ models in candidate set $S^{(d)} = \{\{1,2,3\}, \{1,2\}, \{1,3\}\}$. In (a), the red line instantiates the case where we aim to construct a level-$0.95$ ($\alpha=0.95$) model confidence set. In this case, our $95\%$ model confidence set for the true $\tau_0$ contains two models; i.e., ${\Gamma}^\tau_\alpha(\mathbf{y}_{obs}) = \{\{1,2,3\}, \{1,2\}\}$. In (b), a $95\%$ joint confidence set for $\bm\beta_0^{full}$ is the union of these three confidence sets, one 3-dimensional on the $(\beta_1, \beta_2, \beta_3)$ space and two 2-dimensional ellipsoids on the $(\beta_1, \beta_2)$ and $(\beta_1, \beta_3)$ space, respectively (in each of the cases the remaining $\beta_j$'s are $0$).

Theorems & Definitions (64)

Lemma 1
Remark 1
Theorem 1
Theorem 2
Remark 2
Theorem 3
Theorem 4
Theorem 5
Theorem 6
Theorem 7
...and 54 more

Finite- and Large- Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples

TL;DR

Abstract

Finite- and Large- Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (64)