Powerful rank verification for multivariate Gaussian data with any covariance structure
Anav Sood
TL;DR
This work addresses the problem of confirming that the top-$K$ observed values arise from the top-$K$ population means in multivariate Gaussian data with arbitrary covariance. It introduces a selective-inference framework that tests a data-dependent union of pairwise hypotheses, producing selective p-values based on standardized differences $D_{ij}$ and cross-covariances $\rho_{ij,k\ell}$, and controls Type I error conditioned on the top-$K$ index set. The main findings show that the inference is valid and often aligns with a two-sided mean-difference test on the boundary pair, while offering computationally efficient reductions in specific covariance structures (e.g., independent or equicorrelated data). The results generalize Gutmann’s isotropic, single-sample case to arbitrary $K$, any covariance, and any margin $\delta$, situating the approach relative to Tukey’s HSD and broader selective inference literature with implications for rank verification in ML benchmarks and clinical trials. The paper also clarifies when the proposed method dominates simultaneous inference and provides exact equivalences in key special cases, enhancing both theory and practice of post-selection rank verification.
Abstract
Upon observing $n$-dimensional multivariate Gaussian data, when can we infer that the largest $K$ observations came from the largest $K$ means? When $K=1$ and the covariance is isotropic, \cite{Gutmann} argue that this inference is justified when the two-sided difference-of-means test comparing the largest and second largest observation rejects. Leveraging tools from selective inference, we provide a generalization of their procedure that applies for both any $K$ and any covariance structure. We show that our procedure draws the desired inference whenever the two-sided difference-of-means test comparing the pair of observations inside and outside the top $K$ with the smallest standardized difference rejects, and sometimes even when this test fails to reject. Using this insight, we argue that our procedure renders existing simultaneous inference approaches inadmissible when $n > 2$. When the observations are independent (with possibly unequal variances) or equicorrelated, our procedure corresponds exactly to running the two-sided difference-of-means test comparing the pair of observations inside and outside the top $K$ with the smallest standardized difference.
