Table of Contents
Fetching ...

Gaussian universality for approximately polynomial functions of high-dimensional data

Kevin Han Huang, Morgane Austern, Peter Orbanz

TL;DR

This work establishes nearly optimal Gaussian universality bounds for high-dimensional, approximately polynomial estimators, extending the invariance principle to vectors with coordinate dependence and non-multilinear forms. It provides both upper and lower bounds that reveal a sharp m-dependent threshold m = o(log n) for universality, along with a variance-dominated framework to handle approximate polynomials via variance domination and L2-approximation errors. The authors develop broad applications, including a necessary-and-sufficient condition for asymptotic normality, a bootstrap-inconsistency example, distributional results for high-dimensional U-/V-statistics, and phase-transition analyses for MMD in imbalanced two-sample tests, as well as non-classical Berry-Esséen bounds and a non-classical delta method. Collectively, the results illuminate when Gaussian approximations remain valid in high dimensions, how dimension interacts with estimator degree, and how these insights translate to practical statistical tools under non-Gaussian inputs.

Abstract

Gaussian universality results assert that the properties of many estimators remain unchanged when the input data are replaced by Gaussians. Such results have gained popularity in high-dimensional statistics and machine learning, as Gaussianity often substantially simplifies downstream analyses. Yet, an open question remains on when universality may cease to hold. To address this, we establish nearly optimal upper and lower bounds for Gaussian universality approximation, measured in Kolmogorov distance, over the class of approximately polynomial functions of high-dimensional random vectors. The upper bounds adapt the invariance principle of Mossel, O'Donnell and Oleszkiewicz (2010) for high-dimensional vectors and functions beyond multilinear forms. As applications, we obtain a delta method for high-dimensional data with non-Gaussian limits, a necessary and sufficient condition for asymptotic normality, and simple estimators that are asymptotically normal but for which bootstrap fails to be consistent. We also extend recent results on the high-dimensional degeneracy of non-degenerate U-statistics, phase transition of MMD in two-sample tests with imbalanced data, and confidence spheres for high-dimensional averages. Our lower bound is constructive and shows that, for polynomials of even degree $m$, universality holds up to $m=o(\log n)$. As a corollary, the Gaussian polynomial approximation error of $Ω(n^{-1/6m})$ is not improvable for even-degree U-statistics and V-statistics. Our results also explain how universality results for U-statistics and V-statistics differ significantly in their dependence on dimensions.

Gaussian universality for approximately polynomial functions of high-dimensional data

TL;DR

This work establishes nearly optimal Gaussian universality bounds for high-dimensional, approximately polynomial estimators, extending the invariance principle to vectors with coordinate dependence and non-multilinear forms. It provides both upper and lower bounds that reveal a sharp m-dependent threshold m = o(log n) for universality, along with a variance-dominated framework to handle approximate polynomials via variance domination and L2-approximation errors. The authors develop broad applications, including a necessary-and-sufficient condition for asymptotic normality, a bootstrap-inconsistency example, distributional results for high-dimensional U-/V-statistics, and phase-transition analyses for MMD in imbalanced two-sample tests, as well as non-classical Berry-Esséen bounds and a non-classical delta method. Collectively, the results illuminate when Gaussian approximations remain valid in high dimensions, how dimension interacts with estimator degree, and how these insights translate to practical statistical tools under non-Gaussian inputs.

Abstract

Gaussian universality results assert that the properties of many estimators remain unchanged when the input data are replaced by Gaussians. Such results have gained popularity in high-dimensional statistics and machine learning, as Gaussianity often substantially simplifies downstream analyses. Yet, an open question remains on when universality may cease to hold. To address this, we establish nearly optimal upper and lower bounds for Gaussian universality approximation, measured in Kolmogorov distance, over the class of approximately polynomial functions of high-dimensional random vectors. The upper bounds adapt the invariance principle of Mossel, O'Donnell and Oleszkiewicz (2010) for high-dimensional vectors and functions beyond multilinear forms. As applications, we obtain a delta method for high-dimensional data with non-Gaussian limits, a necessary and sufficient condition for asymptotic normality, and simple estimators that are asymptotically normal but for which bootstrap fails to be consistent. We also extend recent results on the high-dimensional degeneracy of non-degenerate U-statistics, phase transition of MMD in two-sample tests with imbalanced data, and confidence spheres for high-dimensional averages. Our lower bound is constructive and shows that, for polynomials of even degree , universality holds up to . As a corollary, the Gaussian polynomial approximation error of is not improvable for even-degree U-statistics and V-statistics. Our results also explain how universality results for U-statistics and V-statistics differ significantly in their dependence on dimensions.
Paper Structure (49 sections, 526 equations)

This paper contains 49 sections, 526 equations.

Theorems & Definitions (22)

  • proof
  • proof : Proof of \ref{['prop:gaussian']}
  • proof
  • proof : Proof of \ref{['lem:assumption:taylor']}
  • proof : Proof of \ref{['lem:main:taylor']}
  • proof : Proof of \ref{['lem:gaussian:linear:moment']}
  • proof : Proof of \ref{['thm:main']} and \ref{['cor:non:multilinear']}
  • proof : Proof of \ref{['lem:approx:XplusY:by:Y']}
  • proof : Proof of \ref{['thm:VD']}
  • proof : Proof of \ref{['lem:lower:V']}
  • ...and 12 more