Table of Contents
Fetching ...

Inference on testing the number of spikes in a high-dimensional generalized spiked Fisher matrix

Rui Wang, Dandan Jiang

TL;DR

The paper tackles testing the number of spikes in a high-dimensional generalized spiked Fisher matrix under a two-sample framework without assuming Gaussianity or diagonal covariance. It introduces a universal test statistic based on partial linear spectral statistics and proves a central limit theorem under the null, enabling spike-count testing. The method is then applied to two practical problems: identifying the number of significant variables in large-dimensional linear regression and detecting change points in sequence data, with explicit CLTs and practical algorithms for each case. Extensive simulations across diverse settings and an empirical study on macroeconomic data demonstrate robust size, power, and real-world effectiveness. This work broadens spike-testing tools beyond classical diagonal and Gaussian assumptions, offering a flexible, theory-grounded approach for modern high-dimensional inference.

Abstract

The spiked Fisher matrix is a significant topic for two-sample problems in multivariate statistical inference. This paper is dedicated to testing the number of spikes in a high-dimensional generalized spiked Fisher matrix that relaxes the Gaussian population assumption and the diagonal constraints on the population covariance matrices. First, we propose a general test statistic predicated on partial linear spectral statistics to test the number of spikes, then establish the central limit theorem (CLT) for this statistic under the null hypothesis. Second, we apply the CLT to address two statistical problems: variable selection in high-dimensional linear regression and change point detection. For each test problem, we construct new statistics and derive their asymptotic distributions under the null hypothesis. Finally, simulations and empirical analysis are conducted to demonstrate the remarkable effectiveness and generality of our proposed methods across various scenarios.

Inference on testing the number of spikes in a high-dimensional generalized spiked Fisher matrix

TL;DR

The paper tackles testing the number of spikes in a high-dimensional generalized spiked Fisher matrix under a two-sample framework without assuming Gaussianity or diagonal covariance. It introduces a universal test statistic based on partial linear spectral statistics and proves a central limit theorem under the null, enabling spike-count testing. The method is then applied to two practical problems: identifying the number of significant variables in large-dimensional linear regression and detecting change points in sequence data, with explicit CLTs and practical algorithms for each case. Extensive simulations across diverse settings and an empirical study on macroeconomic data demonstrate robust size, power, and real-world effectiveness. This work broadens spike-testing tools beyond classical diagonal and Gaussian assumptions, offering a flexible, theory-grounded approach for modern high-dimensional inference.

Abstract

The spiked Fisher matrix is a significant topic for two-sample problems in multivariate statistical inference. This paper is dedicated to testing the number of spikes in a high-dimensional generalized spiked Fisher matrix that relaxes the Gaussian population assumption and the diagonal constraints on the population covariance matrices. First, we propose a general test statistic predicated on partial linear spectral statistics to test the number of spikes, then establish the central limit theorem (CLT) for this statistic under the null hypothesis. Second, we apply the CLT to address two statistical problems: variable selection in high-dimensional linear regression and change point detection. For each test problem, we construct new statistics and derive their asymptotic distributions under the null hypothesis. Finally, simulations and empirical analysis are conducted to demonstrate the remarkable effectiveness and generality of our proposed methods across various scenarios.
Paper Structure (14 sections, 4 theorems, 51 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 14 sections, 4 theorems, 51 equations, 6 figures, 4 tables, 1 algorithm.

Key Result

Theorem 2.1

For the hypothesis testing problem H1, suppose that Assumptions assum1 to assum3 are satisfied. Then, under the null, the test statistic newtest2 follows where The mean and variance terms, $\mu_{ f,H}$ and $\nu_{f,H}$, are functions of $m_0$ and their expressions can be found in the equations mean and var.

Figures (6)

  • Figure 1: Empirical distribution of $T_{f}$ under the null in Model 1 when $f(x)=\log x$.
  • Figure 2: Empirical distribution of $T_{f}$ under the null in Model 2 when $f(x)=\log x$.
  • Figure 3: Empirical distribution of $T_{l}$ under the null in Model 3.
  • Figure 4: Empirical distribution of $T_{l}$ under the null in Model 4.
  • Figure 5: Accuracy comparisons among different methods in Models 5 and 6. The mark F indicates that the method fails.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Theorem 2.1
  • Corollary 2.1
  • Example 1
  • Example 2
  • Theorem 3.1
  • Corollary 4.1
  • Remark 1
  • proof