Inference on testing the number of spikes in a high-dimensional generalized spiked Fisher matrix
Rui Wang, Dandan Jiang
TL;DR
The paper tackles testing the number of spikes in a high-dimensional generalized spiked Fisher matrix under a two-sample framework without assuming Gaussianity or diagonal covariance. It introduces a universal test statistic based on partial linear spectral statistics and proves a central limit theorem under the null, enabling spike-count testing. The method is then applied to two practical problems: identifying the number of significant variables in large-dimensional linear regression and detecting change points in sequence data, with explicit CLTs and practical algorithms for each case. Extensive simulations across diverse settings and an empirical study on macroeconomic data demonstrate robust size, power, and real-world effectiveness. This work broadens spike-testing tools beyond classical diagonal and Gaussian assumptions, offering a flexible, theory-grounded approach for modern high-dimensional inference.
Abstract
The spiked Fisher matrix is a significant topic for two-sample problems in multivariate statistical inference. This paper is dedicated to testing the number of spikes in a high-dimensional generalized spiked Fisher matrix that relaxes the Gaussian population assumption and the diagonal constraints on the population covariance matrices. First, we propose a general test statistic predicated on partial linear spectral statistics to test the number of spikes, then establish the central limit theorem (CLT) for this statistic under the null hypothesis. Second, we apply the CLT to address two statistical problems: variable selection in high-dimensional linear regression and change point detection. For each test problem, we construct new statistics and derive their asymptotic distributions under the null hypothesis. Finally, simulations and empirical analysis are conducted to demonstrate the remarkable effectiveness and generality of our proposed methods across various scenarios.
