Table of Contents
Fetching ...

Robust Inference Under Heteroskedasticity via the Hadamard Estimator

Edgar Dobriban, Weijie J. Su, Yachong Yang, Zhixiang Zhang

TL;DR

This work develops a Hadamard-based variance estimator for linear regression under heteroskedasticity that remains unbiased in high dimensions, addressing biases inherent in White-type estimators. It establishes the theoretical foundation by proving existence and well-posedness of the estimator via invertibility and conditioning results for the matrix $Q\odot Q$, introduces a degrees-of-freedom adjustment culminating in a Hadamard-t method, and proves rate and asymptotic normality results for key functionals. The paper then demonstrates through simulations that the Hadamard approach yields improved confidence-interval coverage and MSE estimation relative to classical methods, especially as the dimension grows, and provides practical guidance for estimating SNR, signal strength, and MSE in the heteroskedastic setting. The results offer a robust, scalable toolkit for high-dimensional inference in linear models with nonuniform noise, with potential extensions to nonlinear settings and heteroskedasticity testing.

Abstract

Drawing statistical inferences from large datasets in a model-robust way is an important problem in statistics and data science. In this paper, we propose methods that are robust to large and unequal noise in different observational units (i.e., heteroskedasticity) for statistical inference in linear regression. We leverage the Hadamard estimator, which is unbiased for the variances of ordinary least-squares regression. This is in contrast to the popular White's sandwich estimator, which can be substantially biased in high dimensions. We propose to estimate the signal strength, noise level, signal-to-noise ratio, and mean squared error via the Hadamard estimator. We develop a new degrees of freedom adjustment that gives more accurate confidence intervals than variants of White's sandwich estimator. Moreover, we provide conditions ensuring the estimator is well-defined, by studying a new random matrix ensemble in which the entries of a random orthogonal projection matrix are squared. We also show approximate normality, using the second-order Poincare inequality. Our work provides improved statistical theory and methods for linear regression in high dimensions.

Robust Inference Under Heteroskedasticity via the Hadamard Estimator

TL;DR

This work develops a Hadamard-based variance estimator for linear regression under heteroskedasticity that remains unbiased in high dimensions, addressing biases inherent in White-type estimators. It establishes the theoretical foundation by proving existence and well-posedness of the estimator via invertibility and conditioning results for the matrix , introduces a degrees-of-freedom adjustment culminating in a Hadamard-t method, and proves rate and asymptotic normality results for key functionals. The paper then demonstrates through simulations that the Hadamard approach yields improved confidence-interval coverage and MSE estimation relative to classical methods, especially as the dimension grows, and provides practical guidance for estimating SNR, signal strength, and MSE in the heteroskedastic setting. The results offer a robust, scalable toolkit for high-dimensional inference in linear models with nonuniform noise, with potential extensions to nonlinear settings and heteroskedasticity testing.

Abstract

Drawing statistical inferences from large datasets in a model-robust way is an important problem in statistics and data science. In this paper, we propose methods that are robust to large and unequal noise in different observational units (i.e., heteroskedasticity) for statistical inference in linear regression. We leverage the Hadamard estimator, which is unbiased for the variances of ordinary least-squares regression. This is in contrast to the popular White's sandwich estimator, which can be substantially biased in high dimensions. We propose to estimate the signal strength, noise level, signal-to-noise ratio, and mean squared error via the Hadamard estimator. We develop a new degrees of freedom adjustment that gives more accurate confidence intervals than variants of White's sandwich estimator. Moreover, we provide conditions ensuring the estimator is well-defined, by studying a new random matrix ensemble in which the entries of a random orthogonal projection matrix are squared. We also show approximate normality, using the second-order Poincare inequality. Our work provides improved statistical theory and methods for linear regression in high dimensions.

Paper Structure

This paper contains 43 sections, 21 theorems, 107 equations, 10 figures, 2 tables.

Key Result

Proposition 2.1

If the Hadamard product $Q \odot Q$ is invertible, then the sample size $n$ must be at least

Figures (10)

  • Figure 1: Mean type I error over all coordinates.
  • Figure 2: Mean type-I error for each coordinate over 1000 simulations.
  • Figure 3: Mean type-I error in the first coordinate and second coordinate over 1000 simulations each. The error bars represent 95% Clopper-Pearson intervals for the coverage.
  • Figure 4: Bias in estimating MSE.
  • Figure 5: Distribution of $z$-scores of a fixed coordinate of the Hadamard estimator.
  • ...and 5 more figures

Theorems & Definitions (30)

  • Proposition 2.1: Lower bound
  • Theorem 1
  • Corollary 2.2
  • Theorem 2: Eigenvalue bounds for the Hadamard product with a random design
  • Proposition 2.3: Degrees of freedom
  • Proposition 2.4: Bias of classical estimators
  • Proposition 3.1
  • Lemma 3.2
  • proof : Proof of Proposition \ref{['prop:hadamard_full']}
  • Definition 3.3
  • ...and 20 more