Table of Contents
Fetching ...

Double-Estimation-Friendly Inference for High-Dimensional Measurement Error Models with Non-Sparse Adaptability

Shijie Cui, Xu Guo, Songshan Yang, Zhe Zhang

TL;DR

This work addresses inference for a single coefficient in high-dimensional measurement-error models under potential misspecification of the outcome or exposure models. It develops a double-robust, decorrelated score test $T_{DF}$ based on a double-robust moment condition, with a variance estimator $\widehat{\sigma}$, and proves asymptotic normality under $H_0$ in both low- and high-dimensional regimes. A sparsity-adaptive procedure extends the method to settings where the sparsity assumption is violated by using Dantzig-selector-like estimators and data-driven tuning, preserving validity via the DEF property. Simulation studies and real-data analysis (CAMP) demonstrate size control and nontrivial power under misspecification, highlighting the approach's robustness and practical relevance for high-dimensional epidemiological and genomic data where measurement error is present.

Abstract

In this paper, we introduce an innovative testing procedure for assessing individual hypotheses in high-dimensional linear regression models with measurement errors. This method remains robust even when either the X-model or Y-model is misspecified. We develop a double robust score function that maintains a zero expectation if one of the models is incorrect, and we construct a corresponding score test. We first show the asymptotic normality of our approach in a low-dimensional setting, and then extend it to the high-dimensional models. Our analysis of high-dimensional settings explores scenarios both with and without the sparsity condition, establishing asymptotic normality and non-trivial power performance under local alternatives. Simulation studies and real data analysis demonstrate the effectiveness of the proposed method.

Double-Estimation-Friendly Inference for High-Dimensional Measurement Error Models with Non-Sparse Adaptability

TL;DR

This work addresses inference for a single coefficient in high-dimensional measurement-error models under potential misspecification of the outcome or exposure models. It develops a double-robust, decorrelated score test based on a double-robust moment condition, with a variance estimator , and proves asymptotic normality under in both low- and high-dimensional regimes. A sparsity-adaptive procedure extends the method to settings where the sparsity assumption is violated by using Dantzig-selector-like estimators and data-driven tuning, preserving validity via the DEF property. Simulation studies and real-data analysis (CAMP) demonstrate size control and nontrivial power under misspecification, highlighting the approach's robustness and practical relevance for high-dimensional epidemiological and genomic data where measurement error is present.

Abstract

In this paper, we introduce an innovative testing procedure for assessing individual hypotheses in high-dimensional linear regression models with measurement errors. This method remains robust even when either the X-model or Y-model is misspecified. We develop a double robust score function that maintains a zero expectation if one of the models is incorrect, and we construct a corresponding score test. We first show the asymptotic normality of our approach in a low-dimensional setting, and then extend it to the high-dimensional models. Our analysis of high-dimensional settings explores scenarios both with and without the sparsity condition, establishing asymptotic normality and non-trivial power performance under local alternatives. Simulation studies and real data analysis demonstrate the effectiveness of the proposed method.
Paper Structure (10 sections, 11 theorems, 84 equations, 2 figures, 4 tables)

This paper contains 10 sections, 11 theorems, 84 equations, 2 figures, 4 tables.

Key Result

Theorem 1

Suppose (A1)-(A2) hold, under the null hypothesis, $T_{DF}\stackrel{d}{\longrightarrow} N(0,1)$ if model (a) or (b) holds.

Figures (2)

  • Figure 1: Left panel and right panel are the empirical sizes and powers of $T_{DF}$ at level $\alpha = 0.05$ over 1,000 replications in Simulation 1 and 2 respectively. The horizontal dotted line represents significance level 0.05
  • Figure 2: Power comparison plot. Left: when $X$ and active $Z$ are not correlated. Right: when $X$ and active $Z$ are highly correlated

Theorems & Definitions (18)

  • Remark 1
  • Remark 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Corollary 1
  • Theorem 5
  • Lemma 1
  • proof
  • ...and 8 more