Many-sample tests for the dimensionality hypothesis for large covariance matrices among groups
Tianxing Mei, Chen Wang, Jianfeng Yao
TL;DR
The paper tackles the problem of inferring the dimensionality $d$ of the linear span of covariance matrices across many groups in a high-dimensional setting. It introduces a determinant-based distance via the population Gram matrix and builds a generalized $U$-statistic to estimate this distance, proving its asymptotic normality under both null and alternative hypotheses. The resulting testing framework yields a scale-invariant procedure for testing $H_0: d=d_0$ against $H_1: d>d_0$, with explicit power results in the presence of outlier covariance matrices. The method is validated through Monte Carlo experiments and applied to Mouse Aging Project data, highlighting a true dimensionality of $d=3$ and supporting a three-term Kronecker-product representation. Overall, this work provides a principled, scalable approach for assessing covariance structure across a growing number of populations in high dimensions, with practical implications for genomics and related fields.
Abstract
In this paper, we consider procedures for testing hypotheses on the dimension of the linear span generated by a growing number of $p\times p$ covariance matrices from independent $q$ populations. Under a proper limiting scheme where all the parameters, $q$, $p$, and the sample sizes from the $q$ populations, are allowed to increase to infinity, we derive the asymptotic normality of the proposed test statistics. The proposed test procedures show satisfactory performance in finite samples under both the null and the alternative. We also apply the proposed many-sample dimensionality test to investigate a matrix-valued gene dataset from the Mouse Aging Project and gain some new knowledge about its covariance structures.
