Table of Contents
Fetching ...

Dimension-agnostic inference using cross U-statistics

Ilmun Kim, Aaditya Ramdas

Abstract

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

Dimension-agnostic inference using cross U-statistics

Abstract

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension while letting the sample size increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where and both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming , or ? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on versus . We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how scales with . The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a factor.

Paper Structure

This paper contains 46 sections, 15 theorems, 212 equations, 7 figures.

Key Result

Lemma \oldthetheorem

Suppose that we are under the null where the distribution $P$ has $\mu = 0$ and assume $0 < \mathbb{E}_P[f_{\mathrm{mean}}^2(X)|\mathcal{X}_2] < \infty$ almost surely (a.s.). Then there exists an absolute constant $C>0$ such that where we recall $\mathcal{X}_2 = \{X_i\}_{i=m_1+1}^n$.

Figures (7)

  • Figure 1: Pictorial illustration of the difference between the U-statistic (\ref{['Eq: U-statistic']}) and the proposed sample-split counterpart (\ref{['Eq: proposed statistic']}) based on the same kernel $h(x,y)$. The U-statistic is defined as the average of all pairwise distances among observations, corresponding to all elements in the $6 \times 6$ kernel matrix except the diagonal components. On the other hand, the proposed statistic via sample-splitting is defined as the average of pairwise distances between observations from two disjoint subsets, corresponding to the upper-right and lower-left components in the $6 \times 6$ kernel matrix. We prove that this change yields a dimension-agnostic null, while retaining minimax rate optimal power.
  • Figure 2: Illustration of the data settings with a fixed conditioning set $\mathcal{X}_2$ (left) and an increasing conditioning set $\mathcal{X}_2$ (right). When $\mathcal{X}_2$ is fixed, the standard CLT on $\mathcal{X}_1$ applies conditional on $\mathcal{X}_2$. However, when $\mathcal{X}_2$ increases with $n$, proving the asymptotic normality becomes nontrivial and requires additional technical effort as explained in Section \ref{['Section: Showing asymptotic normality under sample splitting']}.
  • Figure 3: Illustration of confidence sets $\mathcal{C}_{n,\text{normal}}(\mathcal{X}_1, \mathcal{X}_2;\alpha)$ (left), $\mathcal{C}_{n,\text{cross}}(\mathcal{X}_1, \mathcal{X}_2;\alpha)$ (middle) and $\mathcal{C}_{n,\mathrm{sym}}(\mathcal{X}_1, \mathcal{X}_2;\alpha)$ (right) for $\mu=(\mu^{(1)},\mu^{(2)})^\top$ at $\alpha=0.05$, $0.15$ and $0.25$. These confidence sets are (asymptotically) valid independent of $d$. We focus on the bivariate case for visualization only.
  • Figure 4: Comparisons of the empirical power to the theoretical (asymptotic) power of the considered tests where $(\star)$ denotes the theoretical power of the corresponding test. Left panel: empirical power of $\phi_{\mathrm{mean}}$ closely tracks (\ref{['Eq: theoretical power of phi_mean']}). Right panel: empirical power of $\phi_{\mathrm{cov}}$ closely tracks (\ref{['Eq: theoretical power of phi_cov']}). Each plot also contains the empirical and theoretical power of $U_{\mathrm{mean}}$ and $U_{\mathrm{cov}}$, presented in (\ref{['Eq: theoretical power of U_1']}) and (\ref{['Eq: theoretical power of U_2']}) respectively.
  • Figure 5: QQ plots of $T_{\mathrm{mean}}$. The results show that the null distribution of $T_{\mathrm{mean}}$ closely follows the standard normal distribution under both scenarios irrespective of the ratio of $n$ and $d$, which coincides with our theory in Theorem \ref{['Theorem: unconditional BE Bound']}. The straight line $y=x$ is added as a reference point.
  • ...and 2 more figures

Theorems & Definitions (24)

  • Remark \oldthetheorem: Relationship to distribution-free inference
  • Remark \oldthetheorem: Ambient vs. intrinsic dimension
  • Remark \oldthetheorem: Dimension-agnostic inference is easy at the expense of power
  • Lemma \oldthetheorem: Conditional pointwise Berry--Esseen bound for $T_{\mathrm{mean}}$
  • Theorem \oldthetheorem: Unconditional uniform Berry--Esseen bound for $T_{\mathrm{mean}}$
  • Remark \oldthetheorem
  • Theorem \oldthetheorem: Asymptotic power expression
  • Remark \oldthetheorem: Intuition on $\sqrt{2}$ factor
  • Theorem \oldthetheorem: Uniform type I and II error
  • Remark \oldthetheorem: Dimension-to-sample size ratio
  • ...and 14 more