Table of Contents
Fetching ...

Using the rejection sampling for finding tests

Markku Kuismin

TL;DR

The paper introduces a rejection-sampling framework for constructing statistical tests in arbitrary dimensions by mapping the acceptance mechanism to a test statistic $\rho(\mathbf{X})=E_U[T(\mathbf{X})]$ and calibrating its null distribution via Monte Carlo. It develops AR-based tests for goodness-of-fit, mean-vector testing, and equality of group means, with the asymptotic structure $nT(\mathbf{X})$ following a Poisson-binomial distribution under $H_0$. Through extensive simulations, the AR tests achieve power comparable to uniformly most powerful tests and can outperform competing methods in goodness-of-fit problems. Real data applications to Amyloid-$\beta$ levels and reaction-time distributions demonstrate practical applicability, flexibility across dimensions, and robust Type I error control, all while providing an intuitive, easy-to-implement approach.

Abstract

A new method based on the rejection sampling for finding statistical tests is proposed. This method is conceptually intuitive, easy to implement, and applicable for arbitrary dimension. To illustrate its potential applicability, three distinct empirical examples are presented: (1) examine the differences between group means of correlated (repeated) or independent samples, (2) examine if a mean vector equals to a specific fixed vector, and (3) investigate if samples come from a specific population distribution. The simulation examples indicate that the new test has similar statistical power as uniformly the most powerful (unbiased) tests. Moreover, these examples demonstrate that the new test is a powerful goodness-of-fit test.

Using the rejection sampling for finding tests

TL;DR

The paper introduces a rejection-sampling framework for constructing statistical tests in arbitrary dimensions by mapping the acceptance mechanism to a test statistic and calibrating its null distribution via Monte Carlo. It develops AR-based tests for goodness-of-fit, mean-vector testing, and equality of group means, with the asymptotic structure following a Poisson-binomial distribution under . Through extensive simulations, the AR tests achieve power comparable to uniformly most powerful tests and can outperform competing methods in goodness-of-fit problems. Real data applications to Amyloid- levels and reaction-time distributions demonstrate practical applicability, flexibility across dimensions, and robust Type I error control, all while providing an intuitive, easy-to-implement approach.

Abstract

A new method based on the rejection sampling for finding statistical tests is proposed. This method is conceptually intuitive, easy to implement, and applicable for arbitrary dimension. To illustrate its potential applicability, three distinct empirical examples are presented: (1) examine the differences between group means of correlated (repeated) or independent samples, (2) examine if a mean vector equals to a specific fixed vector, and (3) investigate if samples come from a specific population distribution. The simulation examples indicate that the new test has similar statistical power as uniformly the most powerful (unbiased) tests. Moreover, these examples demonstrate that the new test is a powerful goodness-of-fit test.

Paper Structure

This paper contains 14 sections, 1 theorem, 13 equations, 6 figures, 4 tables.

Key Result

Theorem 1

Let $r_i = f_0(X_i)/\widehat{f}(X_i)$. Let $S_1 = \{r_i \mid r_i \geq 1\}$ and $S_2 = \{r_i \mid r_i < 1\}$ denote sets that contain values of ratios $r_i$, $i = 1, \ldots, n$ which are greater than one or smaller than one, respectively. Let $\#S_1$ denote the cardinality of the set $S_1$. Let $T(\b

Figures (6)

  • Figure 1: Probability mass function estimate of the statistic $nT(\textbf{X})$ based on 10000 simulation replicates (gray bars) and the Poisson binomial distribution (blue bars) when the AR test is used to test univariate normality (see Definition \ref{['def:ar_stat_gof']} and Section \ref{['sec:goodness-of-fit']}). Different panels represent different sample sizes: (A) 30; (B) 40; and (C) 70. The vertical dashed line illustrates the value $n\rho(\textbf{X})$.
  • Figure 2: Population correlation vs. the empirical power of the AR test (red solid line), Likelihood Ratio test (LR) (dotted purple line), paired $t$-test (solid red line), and independent sample $t$-test ($t$-test) (dashed blue line) at $\alpha = 0.05$. Different panels illustrate the power vs. correlation when the true difference between two group means is: (A) 0.4; (B) 0.6; and (C) 0.8. Panel (D) illustrates the estimated Type I error associated with different tests. The predetermined significance level 0.05 is illustrated with horizontal dashed line in panel (D). Here $n = 52$.
  • Figure 3: Difference between two group means vs. the empirical power of the AR test (solid red line), Likelihood Ratio (LR) test (dashed green line), and Student's $t$-test (dashed blue line) as a function of the true difference between group means when the sample size changes: (A)$n = 26$; (B)$n = 64$; and (C)$n = 394$. The predetermined significance level 0.05 is illustrated with horizontal dashed line.
  • Figure 4: Population correlation vs. power function of the AR test using the population covariance (AR pop. cor) (red solid line), AR test using the sample covariance (AR sample cor) (green dashed line), empirical Likelihood Ratio test (EL) (dashed blue line), and Likelihood Ratio test (LR) (purple dashed line). Different panels illustrate the power vs. correlation when the true difference between two group means is: (A) 0.4; (B) 0.6; and (C) 0.8. Panel (D) illustrates the estimated Type I error associated with different tests. The predetermined significance level 0.05 is illustrated with horizontal dashed line in panel (D). Here $n = 52$.
  • Figure 5: Value of the scale parameter of the location-scale version of the t-distribution ($df=3$) vs. the empirical statistical power of the AR test (AR) (green dashed line), Cramér–von Mises test (CVM) (blue dashed line), the Kolmogorov–Smirnov test (KS) (purple dashed line), and Anderson-Darling test (AD) (solid red line). Different panels represent different sample sizes: (A) 20; (B) 30; and (C) 50. The vertical solid line corresponds to the value of the scale parameter under $H_0$, $\sigma_0 = 2.5$. The predetermined significance level $\alpha = 0.05$ is illustrated with horizontal dashed line.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • proof