Table of Contents
Fetching ...

Post hoc inference via joint family-wise error rate control

Gilles Blanchard, Pierre Neuvial, Etienne Roquain

TL;DR

The paper tackles post hoc inference in large-scale multiple testing by introducing a user-agnostic bound based on a joint-family-wise error rate (JER), ensuring uniform control for any user-selected rejection set. It develops a general framework using a reference family of rejection sets, threshold templates, and $\lambda$-calibration to adapt to unknown dependence and signal sparsity, with explicit constructions under known and unknown dependence. Two template families (linear and balanced) are analyzed, including single-step and step-down procedures, and the approach connects to the Simes/Hommel inequalities as a baseline while enabling adaptive improvements. Numerical experiments demonstrate controlled JER and improved power, highlighting practical guidance on template choice, calibration, and the benefits of user-agnostic inference for exploratory analyses.

Abstract

We introduce a general methodology for post hoc inference in a large-scale multiple testing framework. The approach is called "user-agnostic" in the sense that the statistical guarantee on the number of correct rejections holds for any set of candidate items selected by the user (after having seen the data). This task is investigated by defining a suitable criterion, named the joint-family-wise-error rate (JER for short). We propose several procedures for controlling the JER, with a special focus on incorporating dependencies while adapting to the unknown quantity of signal (via a step-down approach). We show that our proposed setting incorporates as particular cases a version of the higher criticism as well as the closed testing based approach of Goeman and Solari (2011). Our theoretical statements are supported by numerical experiments.

Post hoc inference via joint family-wise error rate control

TL;DR

The paper tackles post hoc inference in large-scale multiple testing by introducing a user-agnostic bound based on a joint-family-wise error rate (JER), ensuring uniform control for any user-selected rejection set. It develops a general framework using a reference family of rejection sets, threshold templates, and -calibration to adapt to unknown dependence and signal sparsity, with explicit constructions under known and unknown dependence. Two template families (linear and balanced) are analyzed, including single-step and step-down procedures, and the approach connects to the Simes/Hommel inequalities as a baseline while enabling adaptive improvements. Numerical experiments demonstrate controlled JER and improved power, highlighting practical guidance on template choice, calibration, and the benefits of user-agnostic inference for exploratory analyses.

Abstract

We introduce a general methodology for post hoc inference in a large-scale multiple testing framework. The approach is called "user-agnostic" in the sense that the statistical guarantee on the number of correct rejections holds for any set of candidate items selected by the user (after having seen the data). This task is investigated by defining a suitable criterion, named the joint-family-wise-error rate (JER for short). We propose several procedures for controlling the JER, with a special focus on incorporating dependencies while adapting to the unknown quantity of signal (via a step-down approach). We show that our proposed setting incorporates as particular cases a version of the higher criticism as well as the closed testing based approach of Goeman and Solari (2011). Our theoretical statements are supported by numerical experiments.

Paper Structure

This paper contains 37 sections, 10 theorems, 41 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Proposition 2.1

Let $\mathfrak{R}=(R_k(X),\zeta_k(X))_{1\leq k\leq K}$ be a data-dependent collection of subsets $R_k$ of $\mathbb{N}_m$ and of integers $\zeta_k$ . Then for any ${\mathcal{H}}_0 \subset \mathbb{N}_m$, ${\mathcal{H}}_1=\mathbb{N}_m \setminus {\mathcal{H}}_0$ , the event $\mathcal{E}(\mathfrak{R},{\m

Figures (5)

  • Figure 1: Illustration of the post hoc selection effect. Right: virtual data set with $1000$ measurements. Left: data set of $55$ measurements selected from the right dataset. Measures have been generated as i.i.d. absolute values of $\mathcal{N}(0,1)$.
  • Figure 2: Toy example: use of a reference family with two subsets $A$ and $B$ to build a post hoc bound on the number of true positives in an arbitrary candidate rejection set $R$.
  • Figure 3: JER control based on the linear template for equi-correlated test statistics.
  • Figure 4: JER control based on the balanced template for equi-correlated test statistics, with $K=m$ and $K=10$.
  • Figure 5: Averaged power of JER controlling procedures for independent test statistics.

Theorems & Definitions (16)

  • Proposition 2.1
  • Remark 2.2
  • Proposition 2.3
  • Proposition 2.4
  • Proposition 2.5
  • Theorem 4.1: Simes and Hommel inequalities
  • Definition 5.1
  • Lemma 5.2
  • Definition 5.3
  • Proposition 5.4
  • ...and 6 more