Dimension-agnostic inference using cross U-statistics

Ilmun Kim; Aaditya Ramdas

Dimension-agnostic inference using cross U-statistics

Ilmun Kim, Aaditya Ramdas

Abstract

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.

Dimension-agnostic inference using cross U-statistics

Abstract

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension

while letting the sample size

increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where

and

both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming

, or

? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on

versus

. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how

scales with

. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a

factor.

Dimension-agnostic inference using cross U-statistics

Abstract

Dimension-agnostic inference using cross U-statistics

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (24)