Table of Contents
Fetching ...

Testing the equality of estimable parameters across many populations

Marcos Romero-Madroñal, María de los Remedios Sillero-Denamiel, María Dolores Jiménez-Gamero

TL;DR

The paper develops a nonparametric, kernel-based framework for testing the equality of estimable parameters across a very large number of populations in the large-k, small-n regime. It introduces a $U$-statistic–based test statistic $T_k$ that consistently estimates $D_k$ and is asymptotically distribution-free under $H_0$ via a ratio-consistent variance estimator; a linear bootstrap and a random-sampling approach enhance finite-sample performance and scalability. Theoretical results show asymptotic normality under the null and provide conditions for strong power against alternatives, with explicit characterizations of the power in terms of the signal-to-noise ratio. Simulations for Gini mean difference and Spearman’s rho, along with an IPUMS USA data application, demonstrate good size control and practical effectiveness in detecting heterogeneity across many populations. The work lays groundwork for extending to dependent data and multivariate functionals in high-dimensional settings.

Abstract

The comparison of a parameter in $k$ populations is a classical problem in statistics. Testing for the equality of means or variances are typical examples. Most procedures designed to deal with this problem assume that $k$ is fixed and that samples with increasing sample sizes are available from each population. This paper introduces and studies a test for the comparison of an estimable parameter across $k$ populations, when $k$ is large and the sample sizes from each population are small when compared with $k$. The proposed test statistic is asymptotically distribution-free under the null hypothesis of parameter homogeneity, enabling asymptotically exact inference without parametric assumptions. Additionally, the behaviour of the proposal is studied under alternatives. Simulations are conducted to evaluate its finite-sample performance, and a linear bootstrap method is implemented to improve its behaviour for small $k$. Finally, an application to a real dataset is presented.

Testing the equality of estimable parameters across many populations

TL;DR

The paper develops a nonparametric, kernel-based framework for testing the equality of estimable parameters across a very large number of populations in the large-k, small-n regime. It introduces a -statistic–based test statistic that consistently estimates and is asymptotically distribution-free under via a ratio-consistent variance estimator; a linear bootstrap and a random-sampling approach enhance finite-sample performance and scalability. Theoretical results show asymptotic normality under the null and provide conditions for strong power against alternatives, with explicit characterizations of the power in terms of the signal-to-noise ratio. Simulations for Gini mean difference and Spearman’s rho, along with an IPUMS USA data application, demonstrate good size control and practical effectiveness in detecting heterogeneity across many populations. The work lays groundwork for extending to dependent data and multivariate functionals in high-dimensional settings.

Abstract

The comparison of a parameter in populations is a classical problem in statistics. Testing for the equality of means or variances are typical examples. Most procedures designed to deal with this problem assume that is fixed and that samples with increasing sample sizes are available from each population. This paper introduces and studies a test for the comparison of an estimable parameter across populations, when is large and the sample sizes from each population are small when compared with . The proposed test statistic is asymptotically distribution-free under the null hypothesis of parameter homogeneity, enabling asymptotically exact inference without parametric assumptions. Additionally, the behaviour of the proposal is studied under alternatives. Simulations are conducted to evaluate its finite-sample performance, and a linear bootstrap method is implemented to improve its behaviour for small . Finally, an application to a real dataset is presented.

Paper Structure

This paper contains 16 sections, 14 theorems, 169 equations, 3 figures, 6 tables.

Key Result

Lemma 1

Suppose that eq:independent_sample and ass:boundenessh2 hold. Then, Moreover, if $\mathbb{E}\{|h(X_{i1},\ldots,X_{im})|^{2+\delta}\} < M$, $\forall i$ for some $M > 0$, then

Figures (3)

  • Figure 1: Scheme of the asymptotic power of the test for different rates of $a_k$ and $n_0$.
  • Figure 2: Boxplot of estimated GMDs per county.
  • Figure 3: Boxplot of approximated Speraman's rhos per county.

Theorems & Definitions (30)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Theorem 1
  • Remark 1
  • Proposition 1
  • Theorem 2
  • Proposition 2
  • ...and 20 more