Table of Contents
Fetching ...

Uniform-over-dimension convergence with application to location tests for high-dimensional data

Joydeep Chowdhury, Subhajit Dutta, Marc G. Genton

Abstract

Asymptotic methods for hypothesis testing in high-dimensional data usually require the dimension of the observations to increase to infinity, often with an additional condition on its rate of increase compared to the sample size. On the other hand, multivariate asymptotic methods are valid for fixed dimension only, and their practical implementations in hypothesis testing methodology typically require the sample size to be large compared to the dimension for yielding desirable results. However, in practical scenarios, it is usually not possible to determine whether the dimension of the data at hand conform to the conditions required for the validity of the high-dimensional asymptotic methods, or whether the sample size is large enough compared to the dimension of the data. In this work, a theory of asymptotic convergence is proposed, which holds uniformly over the dimension of the random vectors. This theory attempts to unify the asymptotic results for fixed-dimensional multivariate data and high-dimensional data, and accounts for the effect of the dimension of the data on the performance of the hypothesis testing procedures. The methodology developed based on this asymptotic theory can be applied to data of any dimension. An application of this theory is demonstrated in the two-sample test for the equality of locations. The test statistic proposed is unscaled by the sample covariance, similar to usual tests for high-dimensional data. Using simulated examples, it is demonstrated that the proposed test exhibits better performance compared to several popular tests in the literature for high-dimensional data. Further, it is demonstrated in simulated models that the proposed unscaled test performs better than the usual scaled two-sample tests for multivariate data, including the Hotelling's $T^2$ test for multivariate Gaussian data.

Uniform-over-dimension convergence with application to location tests for high-dimensional data

Abstract

Asymptotic methods for hypothesis testing in high-dimensional data usually require the dimension of the observations to increase to infinity, often with an additional condition on its rate of increase compared to the sample size. On the other hand, multivariate asymptotic methods are valid for fixed dimension only, and their practical implementations in hypothesis testing methodology typically require the sample size to be large compared to the dimension for yielding desirable results. However, in practical scenarios, it is usually not possible to determine whether the dimension of the data at hand conform to the conditions required for the validity of the high-dimensional asymptotic methods, or whether the sample size is large enough compared to the dimension of the data. In this work, a theory of asymptotic convergence is proposed, which holds uniformly over the dimension of the random vectors. This theory attempts to unify the asymptotic results for fixed-dimensional multivariate data and high-dimensional data, and accounts for the effect of the dimension of the data on the performance of the hypothesis testing procedures. The methodology developed based on this asymptotic theory can be applied to data of any dimension. An application of this theory is demonstrated in the two-sample test for the equality of locations. The test statistic proposed is unscaled by the sample covariance, similar to usual tests for high-dimensional data. Using simulated examples, it is demonstrated that the proposed test exhibits better performance compared to several popular tests in the literature for high-dimensional data. Further, it is demonstrated in simulated models that the proposed unscaled test performs better than the usual scaled two-sample tests for multivariate data, including the Hotelling's test for multivariate Gaussian data.
Paper Structure (10 sections, 10 theorems, 115 equations, 3 figures, 3 tables)

This paper contains 10 sections, 10 theorems, 115 equations, 3 figures, 3 tables.

Key Result

Lemma 1

Under assumption assumption1, given any Borel set $A \in \mathcal{R}^d$, which is a $\mu_p$-continuity set for all $p$, and any $\epsilon > 0$, there are bounded and Lipschitz continuous functions $g, h : \mathbb{R}^d \to \mathbb{R}$ such that $g \le I_A \le h$ and $\sup_p \int (h - g) \mathrm{d} \m

Figures (3)

  • Figure 1: Estimated powers at nominal level 5% ('$\cdots$') for the ZGZC2020 test ('$\boldsymbol{\leftrightline}$'), the SS test ('$\boldsymbol{-}\boldsymbol{-}$'), the BS1996 test ('$-\circ-$'), the CLX2014 test ('$-\smalltriangleup-$'), the CQ2010 test ('$-+-$') and the SD2008 test ('$-\smalldiamond-$') based on 1000 independent replications for $n_1 = 40$ and $n_2 = 50$.
  • Figure 2: Estimated powers at nominal level 5% ('$\cdots$') for the ZGZC2020 test ('$\leftrightline$'), the HT2 test ('$\leftrightline$'), the SS test ('$--$') and the CM1997 test ('$--$') based on 1000 independent replications for $n_1 = 40$ and $n_2 = 50$.
  • Figure 3: Histograms of p-values of the tests in the colon data.

Theorems & Definitions (21)

  • Definition 1
  • Lemma 1
  • Theorem 1: Uniform-over-$p$ Portmanteau theorem
  • Theorem 2: Uniform-over-$p$ Lévy's continuity theorem
  • Theorem 3: Uniform-over-$p$ continuous mapping theorem
  • Definition 2
  • Lemma 2
  • Theorem 4: Uniform-over-$p$ Slutsky's theorem
  • Lemma 3
  • Theorem 5
  • ...and 11 more