Table of Contents
Fetching ...

Singularity-agnostic incomplete U-statistics for testing polynomial constraints in Gaussian covariance matrices

Dennis Leung, Nils Sturma

Abstract

Testing the goodness-of-fit of a model with its defining functional constraints in the parameters could date back to Spearman (1927), who analyzed the famous "tetrad" polynomial in the covariance matrix of the observed variables in a single-factor model. Despite its long history, the Wald test typically employed to operationalize this approach could produce very inaccurate test sizes in many situations, even when the regular conditions for the classical normal asymptotics are met and a very large sample is available. Focusing on testing a polynomial constraint in a Gaussian covariance matrix, we obtained a new understanding of this baffling phenomenon: When the null hypothesis is true but "near-singular", the standardized Wald test exhibits slow weak convergence, owing to the sophisticated dependency structure inherent to the underlying U-statistic that ultimately drives its limiting distribution; this can also be rigorously explained by a key ratio of moments encoded in the Berry-Esseen bound quantifying the normal approximation error involved. As an alternative, we advocate the use of an incomplete U-statistic to mildly tone down the dependence thereof and render the speed of convergence agnostic to the singularity status of the hypothesis. In parallel, we develop a Berry-Esseen bound that is mathematically descriptive of the singularity-agnostic nature of our standardized incomplete U-statistic, using some of the finest exponential-type inequalities in the literature.

Singularity-agnostic incomplete U-statistics for testing polynomial constraints in Gaussian covariance matrices

Abstract

Testing the goodness-of-fit of a model with its defining functional constraints in the parameters could date back to Spearman (1927), who analyzed the famous "tetrad" polynomial in the covariance matrix of the observed variables in a single-factor model. Despite its long history, the Wald test typically employed to operationalize this approach could produce very inaccurate test sizes in many situations, even when the regular conditions for the classical normal asymptotics are met and a very large sample is available. Focusing on testing a polynomial constraint in a Gaussian covariance matrix, we obtained a new understanding of this baffling phenomenon: When the null hypothesis is true but "near-singular", the standardized Wald test exhibits slow weak convergence, owing to the sophisticated dependency structure inherent to the underlying U-statistic that ultimately drives its limiting distribution; this can also be rigorously explained by a key ratio of moments encoded in the Berry-Esseen bound quantifying the normal approximation error involved. As an alternative, we advocate the use of an incomplete U-statistic to mildly tone down the dependence thereof and render the speed of convergence agnostic to the singularity status of the hypothesis. In parallel, we develop a Berry-Esseen bound that is mathematically descriptive of the singularity-agnostic nature of our standardized incomplete U-statistic, using some of the finest exponential-type inequalities in the literature.
Paper Structure (40 sections, 19 theorems, 203 equations, 1 figure)

This paper contains 40 sections, 19 theorems, 203 equations, 1 figure.

Key Result

Theorem 2.1

For $r \in \mathbb{N}$, let $Y_1, \dots, Y_{2r}$ be jointly normal variables that are all centered. Then where the summation is over all partitions $\mathcal{J} = \{\{u_1, v_1\}, \dots, \{u_r, v_r\}\}$ of $[2r] = \{1, \dots, 2r\}$ into disjoint pairs $\{u_\ell, v_\ell\}\in [2r]^2$; in particular, there are $\ (2r)!/ (2^r r!)$ distinct pairings of $[2r]$.

Figures (1)

  • Figure 1.1: The empirical test sizes (produced by $1000$ repeated experiments) of two types of statistics with critical values calibrated based on their asymptotic null distribution $\mathcal{N}(0,1)$, plotted against various target nominal levels. These statistics test the particular tetrad $f(\Theta) = \theta_{14}\theta_{23} - \theta_{13} \theta_{24}$, and are computed with a Gaussian data sample of size $n = 100$ generated as in \ref{['iid_data_sample']}, with a $4$-by-$4$ covariance matrix $\Theta$ having a one-factor structure as in \ref{['one_factor_model']}. The entries in the loading matrix $L$ are all taken to be $0.2$, with the uniqueness matrix $\Psi$ picked so that the diagonal entries of $\Theta$ are $1$. $T_f$ and $\hat{T}_f$ are the standardized and studentized Wald test statistics. With a computational budget of $N = 2n$, $\sqrt{n}U_{n, N}'/\sigma$ and $\sqrt{n}U_{n, N}'/\hat{\sigma}$ are the incomplete U-statistics defined in \ref{['icu_def']}, respectively normalized by the true limiting variance $\sigma^2$ in \ref{['rescaling_factor']} and a data-driven estimate $\hat{\sigma}^2$ of it.

Theorems & Definitions (33)

  • Theorem 2.1: Isserlis' theorem
  • Lemma 2.2: Hypercontractivity of polynomials in jointly normal random variables
  • Theorem 3.1: "Singularity-agnostic" Berry-Esseen bound for $U'_{n, N}$
  • Lemma 3.2: "Consequences" of Bernstein inequalities for the $U_n^{(r)}$'s
  • proof : Proof of Theorem \ref{['thm:main']}
  • Lemma 3.3: B-E bound for $B_n$
  • Lemma 3.4: A nonuniform B-E bound for $U_n$
  • Definition 4.1: $\|\cdot\|_{\mathcal{J}}$-norm for a partition $\mathcal{J} \in \mathcal{P}_I$
  • Theorem 4.1: Sharp moment inequality for canonical generalized decoupled U-statistics
  • Theorem 4.2: Bernstein inequality for $U_n^{(r)}$
  • ...and 23 more