Table of Contents
Fetching ...

Testing for practically significant dependencies in high dimensions via bootstrapping maxima of U-statistics

Patrick Bastian, Holger Dette, Johannes Heiny

TL;DR

An asymptotic and a bootstrap level α-test for the new hypotheses in the high-dimensional regime of the null hypothesis and a solution for a broad class of dependence measures, which can be estimated by U -statistics are provided.

Abstract

This paper takes a different look on the problem of testing the mutual independence of the components of a high-dimensional vector. Instead of testing if all pairwise associations (e.g. all pairwise Kendall's $τ$) between the components vanish, we are interested in the (null)-hypothesis that all pairwise associations do not exceed a certain threshold in absolute value. The consideration of these hypotheses is motivated by the observation that in the high-dimensional regime, it is rare, and perhaps impossible, to have a null hypothesis that can be exactly modeled by assuming that all pairwise associations are precisely equal to zero. The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests non-standard and in this paper we provide a solution for a broad class of dependence measures, which can be estimated by $U$-statistics. In particular we develop an asymptotic and a bootstrap level $α$-test for the new hypotheses in the high-dimensional regime. We also prove that the new tests are minimax-optimal and investigate their finite sample properties by means of a small simulation study and a data example.

Testing for practically significant dependencies in high dimensions via bootstrapping maxima of U-statistics

TL;DR

An asymptotic and a bootstrap level α-test for the new hypotheses in the high-dimensional regime of the null hypothesis and a solution for a broad class of dependence measures, which can be estimated by U -statistics are provided.

Abstract

This paper takes a different look on the problem of testing the mutual independence of the components of a high-dimensional vector. Instead of testing if all pairwise associations (e.g. all pairwise Kendall's ) between the components vanish, we are interested in the (null)-hypothesis that all pairwise associations do not exceed a certain threshold in absolute value. The consideration of these hypotheses is motivated by the observation that in the high-dimensional regime, it is rare, and perhaps impossible, to have a null hypothesis that can be exactly modeled by assuming that all pairwise associations are precisely equal to zero. The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests non-standard and in this paper we provide a solution for a broad class of dependence measures, which can be estimated by -statistics. In particular we develop an asymptotic and a bootstrap level -test for the new hypotheses in the high-dimensional regime. We also prove that the new tests are minimax-optimal and investigate their finite sample properties by means of a small simulation study and a data example.
Paper Structure (33 sections, 36 theorems, 322 equations, 3 figures, 5 tables)

This paper contains 33 sections, 36 theorems, 322 equations, 3 figures, 5 tables.

Key Result

Theorem 2.2

If Assumptions (A1), (A2), (A3) are satisfied, $\log d=o(n^\gamma)$ with $0\leq \gamma \leq \frac{1}{2/\beta+1}$ and then, for any $\alpha \in (0,1-e^{-1})$, with strict inequality, whenever $\limsup_{n\to \infty} | \{ i \in \{ 1,\ldots , d\} : | \theta_i| =\Delta \} |/d < 1$. Moreover,

Figures (3)

  • Figure 1: Simulated rejection probabilities of the test \ref{['boottest']} for the hypotheses \ref{['hd1b']} with $\Delta=0.1$. The dimension is $p=100$, and the sample sizes are $n=50$ (left panels) and $n=100$ (right panels). Upper part: normal distributed data; Lower part: $t_3$-distributed data.
  • Figure 2: Simulated rejection probabilities of the test \ref{['boottestnv']} for the hypotheses \ref{['hd1b']} with $\Delta=0.1$. The dimension is $p=100$, and the sample sizes are $n=50$ (left panels) and $n=100$ (right panels). Upper part: normal distributed data; Lower part: $t_3$-distributed data.
  • Figure 3: Simulated rejection probabilities of the test \ref{['boottestabs']} for the hypotheses \ref{['hd1b']} with $\Delta=0.1$. The dimension is $p=100$, and the sample sizes are $n=50$ (left panels) and $n=100$ (right panels). Upper part: normal distributed data; Lower part: $t_3$-distributed data.

Theorems & Definitions (65)

  • Example 2.1
  • Theorem 2.2
  • Remark 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Remark 2.6
  • Remark 2.7: an alternative test
  • Theorem 2.8
  • Remark 2.9: Testing various thresholds and confidence intervals
  • Remark 2.10: Reversed hypotheses
  • ...and 55 more