Table of Contents
Fetching ...

Many-sample tests for the dimensionality hypothesis for large covariance matrices among groups

Tianxing Mei, Chen Wang, Jianfeng Yao

TL;DR

The paper tackles the problem of inferring the dimensionality $d$ of the linear span of covariance matrices across many groups in a high-dimensional setting. It introduces a determinant-based distance via the population Gram matrix and builds a generalized $U$-statistic to estimate this distance, proving its asymptotic normality under both null and alternative hypotheses. The resulting testing framework yields a scale-invariant procedure for testing $H_0: d=d_0$ against $H_1: d>d_0$, with explicit power results in the presence of outlier covariance matrices. The method is validated through Monte Carlo experiments and applied to Mouse Aging Project data, highlighting a true dimensionality of $d=3$ and supporting a three-term Kronecker-product representation. Overall, this work provides a principled, scalable approach for assessing covariance structure across a growing number of populations in high dimensions, with practical implications for genomics and related fields.

Abstract

In this paper, we consider procedures for testing hypotheses on the dimension of the linear span generated by a growing number of $p\times p$ covariance matrices from independent $q$ populations. Under a proper limiting scheme where all the parameters, $q$, $p$, and the sample sizes from the $q$ populations, are allowed to increase to infinity, we derive the asymptotic normality of the proposed test statistics. The proposed test procedures show satisfactory performance in finite samples under both the null and the alternative. We also apply the proposed many-sample dimensionality test to investigate a matrix-valued gene dataset from the Mouse Aging Project and gain some new knowledge about its covariance structures.

Many-sample tests for the dimensionality hypothesis for large covariance matrices among groups

TL;DR

The paper tackles the problem of inferring the dimensionality of the linear span of covariance matrices across many groups in a high-dimensional setting. It introduces a determinant-based distance via the population Gram matrix and builds a generalized -statistic to estimate this distance, proving its asymptotic normality under both null and alternative hypotheses. The resulting testing framework yields a scale-invariant procedure for testing against , with explicit power results in the presence of outlier covariance matrices. The method is validated through Monte Carlo experiments and applied to Mouse Aging Project data, highlighting a true dimensionality of and supporting a three-term Kronecker-product representation. Overall, this work provides a principled, scalable approach for assessing covariance structure across a growing number of populations in high dimensions, with practical implications for genomics and related fields.

Abstract

In this paper, we consider procedures for testing hypotheses on the dimension of the linear span generated by a growing number of covariance matrices from independent populations. Under a proper limiting scheme where all the parameters, , , and the sample sizes from the populations, are allowed to increase to infinity, we derive the asymptotic normality of the proposed test statistics. The proposed test procedures show satisfactory performance in finite samples under both the null and the alternative. We also apply the proposed many-sample dimensionality test to investigate a matrix-valued gene dataset from the Mouse Aging Project and gain some new knowledge about its covariance structures.
Paper Structure (16 sections, 12 theorems, 148 equations, 2 figures, 2 tables)

This paper contains 16 sections, 12 theorems, 148 equations, 2 figures, 2 tables.

Key Result

Lemma 2.1

Suppose that the dimension of $\mathcal{H}_0$ is $d$. Then, for any $k\leq d$, $M^{(k)}_p$ is positive; and for any $k > d$, $M^{(k)}_p = 0$.

Figures (2)

  • Figure 1: Empirical power plot of the many-sample dimensionality test. Case (a) (left) and (b) (right) with two different noises: normal noise (red square) and Gamma noise (blue triangle). Curves with black circles stand for theoretical power functions in (\ref{['exm:1']}) and (\ref{['exm:2']}), respectively.
  • Figure 2: The Scatter plot (a) and the histogram (b) of values of the difference $RSS_3-RSS_1$ for 1000 experiments.

Theorems & Definitions (20)

  • Lemma 2.1
  • Lemma 2.2
  • proof
  • Lemma 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Remark 1
  • Lemma 2.6
  • Theorem 2.7
  • Remark 2
  • ...and 10 more