Testing the equality of estimable parameters across many populations
Marcos Romero-Madroñal, María de los Remedios Sillero-Denamiel, María Dolores Jiménez-Gamero
TL;DR
The paper develops a nonparametric, kernel-based framework for testing the equality of estimable parameters across a very large number of populations in the large-k, small-n regime. It introduces a $U$-statistic–based test statistic $T_k$ that consistently estimates $D_k$ and is asymptotically distribution-free under $H_0$ via a ratio-consistent variance estimator; a linear bootstrap and a random-sampling approach enhance finite-sample performance and scalability. Theoretical results show asymptotic normality under the null and provide conditions for strong power against alternatives, with explicit characterizations of the power in terms of the signal-to-noise ratio. Simulations for Gini mean difference and Spearman’s rho, along with an IPUMS USA data application, demonstrate good size control and practical effectiveness in detecting heterogeneity across many populations. The work lays groundwork for extending to dependent data and multivariate functionals in high-dimensional settings.
Abstract
The comparison of a parameter in $k$ populations is a classical problem in statistics. Testing for the equality of means or variances are typical examples. Most procedures designed to deal with this problem assume that $k$ is fixed and that samples with increasing sample sizes are available from each population. This paper introduces and studies a test for the comparison of an estimable parameter across $k$ populations, when $k$ is large and the sample sizes from each population are small when compared with $k$. The proposed test statistic is asymptotically distribution-free under the null hypothesis of parameter homogeneity, enabling asymptotically exact inference without parametric assumptions. Additionally, the behaviour of the proposal is studied under alternatives. Simulations are conducted to evaluate its finite-sample performance, and a linear bootstrap method is implemented to improve its behaviour for small $k$. Finally, an application to a real dataset is presented.
