Testing for practically significant dependencies in high dimensions via bootstrapping maxima of U-statistics

Patrick Bastian; Holger Dette; Johannes Heiny

Testing for practically significant dependencies in high dimensions via bootstrapping maxima of U-statistics

Patrick Bastian, Holger Dette, Johannes Heiny

TL;DR

An asymptotic and a bootstrap level α-test for the new hypotheses in the high-dimensional regime of the null hypothesis and a solution for a broad class of dependence measures, which can be estimated by U -statistics are provided.

Abstract

This paper takes a different look on the problem of testing the mutual independence of the components of a high-dimensional vector. Instead of testing if all pairwise associations (e.g. all pairwise Kendall's $τ$) between the components vanish, we are interested in the (null)-hypothesis that all pairwise associations do not exceed a certain threshold in absolute value. The consideration of these hypotheses is motivated by the observation that in the high-dimensional regime, it is rare, and perhaps impossible, to have a null hypothesis that can be exactly modeled by assuming that all pairwise associations are precisely equal to zero. The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests non-standard and in this paper we provide a solution for a broad class of dependence measures, which can be estimated by $U$-statistics. In particular we develop an asymptotic and a bootstrap level $α$-test for the new hypotheses in the high-dimensional regime. We also prove that the new tests are minimax-optimal and investigate their finite sample properties by means of a small simulation study and a data example.

Testing for practically significant dependencies in high dimensions via bootstrapping maxima of U-statistics

TL;DR

Abstract

) between the components vanish, we are interested in the (null)-hypothesis that all pairwise associations do not exceed a certain threshold in absolute value. The consideration of these hypotheses is motivated by the observation that in the high-dimensional regime, it is rare, and perhaps impossible, to have a null hypothesis that can be exactly modeled by assuming that all pairwise associations are precisely equal to zero. The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests non-standard and in this paper we provide a solution for a broad class of dependence measures, which can be estimated by

-statistics. In particular we develop an asymptotic and a bootstrap level

-test for the new hypotheses in the high-dimensional regime. We also prove that the new tests are minimax-optimal and investigate their finite sample properties by means of a small simulation study and a data example.

Paper Structure (33 sections, 36 theorems, 322 equations, 3 figures, 5 tables)

This paper contains 33 sections, 36 theorems, 322 equations, 3 figures, 5 tables.

Introduction
Testing for relevant deviations
An asymptotic level $\alpha$ test
Bootstrap
Relevant dependencies in high-dimension
Covariance
Kendall's $\tau$
The dominating term of Spearman's $\rho$
Dependence measures with degenerate kernel
Minimax optimality
Finite sample properties
Test statistics involving $U_{ij}^2$
Test statistics involving $|U_{ij}|$
Reversed Hypotheses
Real Data Application Example
...and 18 more sections

Key Result

Theorem 2.2

If Assumptions (A1), (A2), (A3) are satisfied, $\log d=o(n^\gamma)$ with $0\leq \gamma \leq \frac{1}{2/\beta+1}$ and then, for any $\alpha \in (0,1-e^{-1})$, with strict inequality, whenever $\limsup_{n\to \infty} | \{ i \in \{ 1,\ldots , d\} : | \theta_i| =\Delta \} |/d < 1$. Moreover,

Figures (3)

Figure 1: Simulated rejection probabilities of the test \ref{['boottest']} for the hypotheses \ref{['hd1b']} with $\Delta=0.1$. The dimension is $p=100$, and the sample sizes are $n=50$ (left panels) and $n=100$ (right panels). Upper part: normal distributed data; Lower part: $t_3$-distributed data.
Figure 2: Simulated rejection probabilities of the test \ref{['boottestnv']} for the hypotheses \ref{['hd1b']} with $\Delta=0.1$. The dimension is $p=100$, and the sample sizes are $n=50$ (left panels) and $n=100$ (right panels). Upper part: normal distributed data; Lower part: $t_3$-distributed data.
Figure 3: Simulated rejection probabilities of the test \ref{['boottestabs']} for the hypotheses \ref{['hd1b']} with $\Delta=0.1$. The dimension is $p=100$, and the sample sizes are $n=50$ (left panels) and $n=100$ (right panels). Upper part: normal distributed data; Lower part: $t_3$-distributed data.

Theorems & Definitions (65)

Example 2.1
Theorem 2.2
Remark 2.3
Theorem 2.4
Theorem 2.5
Remark 2.6
Remark 2.7: an alternative test
Theorem 2.8
Remark 2.9: Testing various thresholds and confidence intervals
Remark 2.10: Reversed hypotheses
...and 55 more

Testing for practically significant dependencies in high dimensions via bootstrapping maxima of U-statistics

TL;DR

Abstract

Testing for practically significant dependencies in high dimensions via bootstrapping maxima of U-statistics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (65)