Table of Contents
Fetching ...

BEAUTY Powered BEAST

Kai Zhang, Wan Zhang, Zhigen Zhao, Wen Zhou

TL;DR

This work demonstrates that the Neyman-Pearson test of uniformity can be approximated by an oracle weighted sum of symmetry statistics, and devise the BEAST, which improves the empirical power of many existing tests against a wide spectrum of common alternatives and delivers a clear interpretation of dependency forms when significant.

Abstract

We study distribution-free goodness-of-fit tests with the proposed Binary Expansion Approximation of UniformiTY (BEAUTY) approach. This method generalizes the renowned Euler's formula, and approximates the characteristic function of any copula through a linear combination of expectations of binary interactions from marginal binary expansions. This novel theory enables a unification of many important tests of independence via approximations from specific quadratic forms of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a robust power, we examine test statistics with data-adaptive weights, referred to as the Binary Expansion Adaptive Symmetry Test (BEAST). For any given alternative, we demonstrate that the Neyman-Pearson test can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle provides a useful benchmark of feasible power. To approach this oracle power, we devise the BEAST through a regularized resampling approximation of the oracle test. The BEAST improves the empirical power of many existing tests against a wide spectrum of common alternatives and delivers a clear interpretation of dependency forms when significant.

BEAUTY Powered BEAST

TL;DR

This work demonstrates that the Neyman-Pearson test of uniformity can be approximated by an oracle weighted sum of symmetry statistics, and devise the BEAST, which improves the empirical power of many existing tests against a wide spectrum of common alternatives and delivers a clear interpretation of dependency forms when significant.

Abstract

We study distribution-free goodness-of-fit tests with the proposed Binary Expansion Approximation of UniformiTY (BEAUTY) approach. This method generalizes the renowned Euler's formula, and approximates the characteristic function of any copula through a linear combination of expectations of binary interactions from marginal binary expansions. This novel theory enables a unification of many important tests of independence via approximations from specific quadratic forms of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a robust power, we examine test statistics with data-adaptive weights, referred to as the Binary Expansion Adaptive Symmetry Test (BEAST). For any given alternative, we demonstrate that the Neyman-Pearson test can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle provides a useful benchmark of feasible power. To approach this oracle power, we devise the BEAST through a regularized resampling approximation of the oracle test. The BEAST improves the empirical power of many existing tests against a wide spectrum of common alternatives and delivers a clear interpretation of dependency forms when significant.

Paper Structure

This paper contains 17 sections, 7 theorems, 28 equations, 3 figures, 3 tables.

Key Result

Theorem 1.1

If $U \sim \mathrm{Unif}[-1,1],$ then $U=\sum_{d=1}^\infty 2^{-d}{A_d}$ where $A_d \stackrel{i.i.d.}{\sim} \text{Rademacher}$, that is $A_d\in\{-1,1\}$ with equal probabilities.

Figures (3)

  • Figure 1: The power curves of various methods when testing the bivariate independence under four alternatives. The sample size $n=128$ and the depth of the BEAST is chosen as $3$. The level of significance is set to be $0.1$. The BEAST with oracle provides a benchmark on the feasible power for all cases. The power of the BEAST consistently ranks within the top three among all tests for all cases, while being the best under the "Parabolic" and "Circle" cases.
  • Figure 2: The power curves of various methods when testing the independence between $(X_1,X_2)$ and $Y$ under four alternatives. The depth of the BEAST is $3$ and $n=128$. The level of significance is set to be $0.1$. The BEAST with oracle provides a benchmark on the feasible power for all cases. The power of the BEAST is the highest among all tests for all nonlinear forms of dependency.
  • Figure 3: Display of the binary interaction explaining the relationship between the location and brightness of stars. The left panel shows the scatter plot of galactic latitude ($X$) and absolute magnitude ($Y$) on the original scale. The middle panel shows the empirical copula of this distribution, equipped with the most frequent binary interaction in subsamples. There are 162 points in white regions in contrast to 94 points in blue regions, resulting in a symmetry statistic is $68$ and a $Z$-statistic of $4.25$ for testing the balance of points in white regions and blue regions. The right panel shows the scatter plot on the original scale equipped with the same binary interaction. We notice that brighter stars (lower $Y$) tend to fall between $-16.1^\circ$ and $23.4^\circ$ in latitude, while darker stars (higher $Y$) tend to be outside this interval of $X$. This pattern provides a scientifically meaningful explanation of the statistical significance.

Theorems & Definitions (7)

  • Theorem 1.1
  • Lemma 1.2: Binary Euler's Equation
  • Lemma 2.1
  • Theorem 2.2: Binary Expansion Approximation of Uniformity, BEAUTY
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3