Gaussian universality for approximately polynomial functions of high-dimensional data

Kevin Han Huang; Morgane Austern; Peter Orbanz

Gaussian universality for approximately polynomial functions of high-dimensional data

Kevin Han Huang, Morgane Austern, Peter Orbanz

TL;DR

This work establishes nearly optimal Gaussian universality bounds for high-dimensional, approximately polynomial estimators, extending the invariance principle to vectors with coordinate dependence and non-multilinear forms. It provides both upper and lower bounds that reveal a sharp m-dependent threshold m = o(log n) for universality, along with a variance-dominated framework to handle approximate polynomials via variance domination and L2-approximation errors. The authors develop broad applications, including a necessary-and-sufficient condition for asymptotic normality, a bootstrap-inconsistency example, distributional results for high-dimensional U-/V-statistics, and phase-transition analyses for MMD in imbalanced two-sample tests, as well as non-classical Berry-Esséen bounds and a non-classical delta method. Collectively, the results illuminate when Gaussian approximations remain valid in high dimensions, how dimension interacts with estimator degree, and how these insights translate to practical statistical tools under non-Gaussian inputs.

Abstract

Gaussian universality results assert that the properties of many estimators remain unchanged when the input data are replaced by Gaussians. Such results have gained popularity in high-dimensional statistics and machine learning, as Gaussianity often substantially simplifies downstream analyses. Yet, an open question remains on when universality may cease to hold. To address this, we establish nearly optimal upper and lower bounds for Gaussian universality approximation, measured in Kolmogorov distance, over the class of approximately polynomial functions of high-dimensional random vectors. The upper bounds adapt the invariance principle of Mossel, O'Donnell and Oleszkiewicz (2010) for high-dimensional vectors and functions beyond multilinear forms. As applications, we obtain a delta method for high-dimensional data with non-Gaussian limits, a necessary and sufficient condition for asymptotic normality, and simple estimators that are asymptotically normal but for which bootstrap fails to be consistent. We also extend recent results on the high-dimensional degeneracy of non-degenerate U-statistics, phase transition of MMD in two-sample tests with imbalanced data, and confidence spheres for high-dimensional averages. Our lower bound is constructive and shows that, for polynomials of even degree $m$, universality holds up to $m=o(\log n)$. As a corollary, the Gaussian polynomial approximation error of $Ω(n^{-1/6m})$ is not improvable for even-degree U-statistics and V-statistics. Our results also explain how universality results for U-statistics and V-statistics differ significantly in their dependence on dimensions.

Gaussian universality for approximately polynomial functions of high-dimensional data

TL;DR

Abstract

, universality holds up to

. As a corollary, the Gaussian polynomial approximation error of

is not improvable for even-degree U-statistics and V-statistics. Our results also explain how universality results for U-statistics and V-statistics differ significantly in their dependence on dimensions.

Paper Structure (49 sections, 526 equations)

This paper contains 49 sections, 526 equations.

Introduction
Related work
Main results
Universality of multilinear polynomials
Approximately polynomial functions
Universality of symmetric non-multilinear polynomials and a discussion on dimension dependence
Nearly matching lower bound and dependence on the degree $m$
Applications
Necessary and sufficient condition for asymptotic normality
An asymptotically normal statistic for which bootstrap is inconsistent
Distribution approximations of degree-$m$ U-statistics of high-dimensional data
Phase transitions of MMD in two-sample tests with imbalanced data
Non-classical Berry-Esséen bounds for high-dimensional averages and data with non-invertible covariance matrices
Non-classical delta method for high dimensional data
Additional results
...and 34 more sections

Theorems & Definitions (22)

proof
proof : Proof of \ref{['prop:gaussian']}
proof
proof : Proof of \ref{['lem:assumption:taylor']}
proof : Proof of \ref{['lem:main:taylor']}
proof : Proof of \ref{['lem:gaussian:linear:moment']}
proof : Proof of \ref{['thm:main']} and \ref{['cor:non:multilinear']}
proof : Proof of \ref{['lem:approx:XplusY:by:Y']}
proof : Proof of \ref{['thm:VD']}
proof : Proof of \ref{['lem:lower:V']}
...and 12 more

Gaussian universality for approximately polynomial functions of high-dimensional data

TL;DR

Abstract

Gaussian universality for approximately polynomial functions of high-dimensional data

Authors

TL;DR

Abstract

Table of Contents

Theorems & Definitions (22)