Table of Contents
Fetching ...

Learning Centre Partitions from Summaries

Zinsou Max Debaly, Jean-Francois Ethier, Michael H. Neumann, Félix Camirand-Lemyre

TL;DR

To improve finite-sample integration, a multi-round bootstrap CoC that re-evaluates merges across independently resampled summary sets is introduced and it is proved a \emph{golden-partition recovery} result: as the number of rounds grows with $n$, the true partition is recovered with probability tending to one.

Abstract

Multi-centre studies increasingly rely on distributed inference, where sites share only centre-level summaries. Homogeneity of parameters across centres is often violated, motivating methods that both \emph{test} for equality and \emph{learn} centre groupings before estimation. We develop multivariate Cochran-type tests that operate on summary statistics and embed them in a sequential, test-driven \emph{Clusters-of-Centres (CoC)} algorithm that merges centres (or blocks) only when equality is not rejected. We derive the asymptotic $χ^2$-mixture distributions of the test statistics and provide plug-in estimators for implementation. To improve finite-sample integration, we introduce a multi-round bootstrap CoC that re-evaluates merges across independently resampled summary sets; under mild regularity and a separation condition, we prove a \emph{golden-partition recovery} result: as the number of rounds grows with $n$, the true partition is recovered with probability tending to one. We also give simple numerical guidelines, including a plateau-based stopping rule, to make the multi-round procedure reproducible. Simulations and a real-data analysis of U.S.\ airline on-time performance (2007) show accurate heterogeneity detection and partitions that change little with the choice of resampling scheme.

Learning Centre Partitions from Summaries

TL;DR

To improve finite-sample integration, a multi-round bootstrap CoC that re-evaluates merges across independently resampled summary sets is introduced and it is proved a \emph{golden-partition recovery} result: as the number of rounds grows with , the true partition is recovered with probability tending to one.

Abstract

Multi-centre studies increasingly rely on distributed inference, where sites share only centre-level summaries. Homogeneity of parameters across centres is often violated, motivating methods that both \emph{test} for equality and \emph{learn} centre groupings before estimation. We develop multivariate Cochran-type tests that operate on summary statistics and embed them in a sequential, test-driven \emph{Clusters-of-Centres (CoC)} algorithm that merges centres (or blocks) only when equality is not rejected. We derive the asymptotic -mixture distributions of the test statistics and provide plug-in estimators for implementation. To improve finite-sample integration, we introduce a multi-round bootstrap CoC that re-evaluates merges across independently resampled summary sets; under mild regularity and a separation condition, we prove a \emph{golden-partition recovery} result: as the number of rounds grows with , the true partition is recovered with probability tending to one. We also give simple numerical guidelines, including a plateau-based stopping rule, to make the multi-round procedure reproducible. Simulations and a real-data analysis of U.S.\ airline on-time performance (2007) show accurate heterogeneity detection and partitions that change little with the choice of resampling scheme.

Paper Structure

This paper contains 38 sections, 15 theorems, 172 equations, 2 figures, 3 algorithms.

Key Result

Lemma 1

Consider the centre-specific asymptotic decomposition eq::asymptoticDecomposition and suppose that the assumptions (A1) to (A4) hold. Then, under the null hypothesis $H_0 : \theta_{0, 1} = \theta_{0, 2} = \cdots = \theta_{0, K} = \theta_0,$ we have where $\{\lambda_\ell\}_{\ell=1}^{Kp}$ are the nonnegative eigenvalues of $\overline{Q}^{1/2}\, H^\top H\, \overline{Q}^{1/2}$ and $\{\chi_\ell^2\}_{\

Figures (2)

  • Figure 1: Performance curves versus $n$ (solid: $R=50$, dashed: $R=100$), for each $(\delta,u_n)$ setting for $L= 4$ and $K= 20$.
  • Figure 2: Performance curves versus $n$ (solid: $R=50$, dashed: $R=100$), for each $(\delta,u_n)$ setting for $L= 6$ and $K= 40$.

Theorems & Definitions (15)

  • Lemma 1
  • Proposition 1
  • Lemma 2
  • Proposition 2
  • Lemma 3
  • Theorem 1
  • Lemma 4
  • Proposition 3
  • Theorem 2
  • Theorem 3
  • ...and 5 more