Detecting Where Effects Occur by Testing Hypotheses in Order

Jake Bowers; David Kim; Nuole Chen

Detecting Where Effects Occur by Testing Hypotheses in Order

Jake Bowers, David Kim, Nuole Chen

Abstract

Experimental evaluations of public policies often randomize a new intervention within many sites or blocks. After a report of an overall result -- statistically significant or not -- the natural question from a policy maker is: \emph{where} did any effects occur? Standard adjustments for multiple testing provide little power to answer this question. In simulations modeled after a 44-block education trial, the Hommel adjustment -- among the most powerful procedures controlling the family-wise error rate (FWER) -- detects effects in only 11\% of truly non-null blocks. We develop a procedure that tests hypotheses top-down through a tree: test the overall null at the root, then groups of blocks, then individual blocks, stopping any branch where the null is not rejected. In the same 44-block design, this approach detects effects in 44\% of non-null blocks -- roughly four times the detection rate. A stopping rule and valid tests at each node suffice for weak FWER control. We show that the strong-sense FWER depends on how rejection probabilities accumulate along paths through the tree. This yields a diagnostic: when power decays fast enough relative to branching, no adjustment is needed; otherwise, an adaptive $α$-adjustment restores control. We apply the method to 25 MDRC education trials and provide an R package, \texttt{manytestsr}.

Detecting Where Effects Occur by Testing Hypotheses in Order

Abstract

-adjustment restores control. We apply the method to 25 MDRC education trials and provide an R package, \texttt{manytestsr}.

Paper Structure (30 sections, 9 theorems, 30 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 30 sections, 9 theorems, 30 equations, 5 figures, 7 tables, 1 algorithm.

Where do block-level effects occur?
Testing to Detect Effects in Blocks
Testing in every block: the basic problem of multiple testing
Sequential Structured Testing and Weak Control of the FWER
Simulation Study of Weak Control of the FWER to Illustrate the Proof
Why might weak control suffice?
Strong control of the FWER
Strong control via error load and adaptive adjustment
Branch pruning.
An Example: A Simulated Version of The Detroit Promise Program
Does the approach control the FWER?
Simulation Study of Strong FWER Control
Application: The MDRC RCT Data
Limitations
Test statistic.
...and 15 more sections

Key Result

Theorem 1

Conditions cond:stopping and cond:valid suffice for weak FWER control A family of true null hypotheses organized on an irregular or regular $k$-ary tree and tested following the stopping rule (Condition cond:stopping) with valid tests at each node (Condition cond:valid) will produce a family-wise er

Figures (5)

Figure 1: An administratively organized structure of blocks. A study randomly assigns people within offices ($B$) to a new intervention. Each office is an experimental block containing $m_b$ people assigned to the intervention and $n_b - m_b$ people assigned to the status quo.
Figure 2: A $k$-ary tree with $k=3$ nodes per level and $L=3$ levels and $k^{L-1}=9$ terminal nodes or "leaves" representing individual experimental blocks.
Figure 3: Simplified flow of the Top-Down Testing and Splitting Algorithm with fixed false positive level $\alpha$. All blocks are in set $\mathcal{B}_1$, $\mathcal{B}_2$ is a subset of $\mathcal{B}_1$, $\mathcal{B}_{4}$ is a subset of the blocks in $\mathcal{B}_2$. The $p$-value, $p_1$, is the result from a test of the hypothesis of no effects using all the blocks (i.e using the set $\mathcal{B}_1$), $p_2$ is the $p$-value from a test of the null of no effects using only the blocks in $\mathcal{B}_2$. Testing stops when $p > \alpha$ or when the number of blocks in $\mathcal{B}$, written $|\mathcal{B}|$, is 1 such that for a given node $i$, $|\mathcal{B}_{i}|=1$.
Figure 4: A $k$-ary tree with $k=3$ and $L=3$. Boxes show non-null nodes: since the leaf (node 5) is non-null, all of its ancestors are non-null. The other nodes in the tree are null.
Figure 5: Results of top-down testing in a simulation of 44 experimental blocks following the pre-specified experimental design of the Detroit Promise Program ratledge2019path. Nine blocks within HFCC have non-zero effects (Cohen's $d = 0.80$); all other blocks are pure null. The algorithm identifies HFCC and descends into its cohorts and blocks while pruning null colleges. Abbreviations: Henry Ford Community College (HFCC), Macomb Community College (MCC), Oakland Community College (OCC), Schoolcraft College (SC), Wayne County Community College District (WCC). Blue nodes have non-zero causal effects.

Theorems & Definitions (26)

Theorem 1
Remark 1
Theorem 2: Conditions 1 and 2 suffice for weak FWER control; restated from main text
proof : Proof of Theorem \ref{['thm:weakctrl']}
Remark 2
Remark 3: Relationship to prior work
Proposition 1: FWER Expression for Sequential Tree Testing
proof
Remark 4: When is the bound tight?
Theorem 3: FWER Decomposition by Level
...and 16 more

Detecting Where Effects Occur by Testing Hypotheses in Order

Abstract

Detecting Where Effects Occur by Testing Hypotheses in Order

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (26)