Global Sequential Testing for Multi-Stream Auditing

Beepul Bharti; Ambar Pal; Jeremias Sulam

Global Sequential Testing for Multi-Stream Auditing

Beepul Bharti, Ambar Pal, Jeremias Sulam

TL;DR

This work constructs new sequential tests by using ideas of merging test martingales with different trade-offs in expected stopping times under different, sparse or dense alternative hypotheses, and derives a new, balanced test that achieves an improved expected stopping time bound that matches Bonferroni's in the sparse setting but that naturally results in a dense alternative under a dense alternative.

Abstract

Across many risk-sensitive areas, it is critical to continuously audit the performance of machine learning systems and detect any unusual behavior quickly. This can be modeled as a sequential hypothesis testing problem with $k$ incoming streams of data and a global null hypothesis that asserts that the system is working as expected across all $k$ streams. The standard global test employs a Bonferroni correction and has an expected stopping time bound of $O\left(\ln\frac{k}α\right)$ when $k$ is large and the significance level of the test, $α$, is small. In this work, we construct new sequential tests by using ideas of merging test martingales with different trade-offs in expected stopping times under different, sparse or dense alternative hypotheses. We further derive a new, balanced test that achieves an improved expected stopping time bound that matches Bonferroni's in the sparse setting but that naturally results in $O\left(\frac{1}{k}\ln\frac{1}α\right)$ under a dense alternative. We empirically demonstrate the effectiveness of our proposed tests on synthetic and real-world data.

Global Sequential Testing for Multi-Stream Auditing

TL;DR

Abstract

incoming streams of data and a global null hypothesis that asserts that the system is working as expected across all

streams. The standard global test employs a Bonferroni correction and has an expected stopping time bound of

when

is large and the significance level of the test,

, is small. In this work, we construct new sequential tests by using ideas of merging test martingales with different trade-offs in expected stopping times under different, sparse or dense alternative hypotheses. We further derive a new, balanced test that achieves an improved expected stopping time bound that matches Bonferroni's in the sparse setting but that naturally results in

under a dense alternative. We empirically demonstrate the effectiveness of our proposed tests on synthetic and real-world data.

Paper Structure (31 sections, 22 theorems, 147 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 31 sections, 22 theorems, 147 equations, 11 figures, 3 tables, 2 algorithms.

Introduction
Contributions
Problem Formulation
Related Work
A Primer in Sequential Testing
Testing a Stream-Specific Null $H_{i,0}$
Standard Global Sequential Testing
Bonferonni correction
Global Sequential Tests via Multivariate Strategies
Powerful Global Tests via Merging
Product Wealth Process
Average Wealth Process
A Balanced Sequential Test
Experiments
Synthetic
...and 16 more sections

Key Result

Theorem 3.1

Let $M$ be a nonnegative $P$-supermartingale with an initial value $M_0 \geq 0$. Then $\forall \alpha > 0$, $P(\exists t\geq 1: M_t \geq 1/\alpha) \leq \alpha{\mathbb{E}}_{P}[M_0]$.

Figures (11)

Figure 1: Top: Distribution of stopping times, over 1,000 simulations, for various sequential tests across settings with varying proportions of streams with nonzero means. A test rejects when its corresponding wealth process exceeds $1/\alpha$ for $\alpha = 0.01$. The dashed vertical line is the empirical mean of the stopping times. Bottom: Trajectories of various wealth processes across settings with different amounts of nonzero means. Each line represents the median trajectory of a wealth process over 1,000 simulations, with shaded areas indicating the 25% and 75% quantiles. The y-axis is presented on a logarithmic scale. Wealth processes are clipped to $10^{-3}$ for visualization purposes.
Figure 2: Left plot of each figure: Distribution of stopping times, over 1,000 runs, for various sequential tests. A test rejects when its corresponding wealth process exceeds $1/\alpha$ for $\alpha = 0.01$. The dashed vertical line is the empirical mean of the stopping times. Right plot of each figure: Various wealth process trajectories. Each line represents the median trajectory of a wealth process over 1,000 runs, with shaded areas indicating the 25% and 75% quantiles.
Figure :
Figure :
Figure B.1: Top: Distribution of stopping times, over 1,000 simulations, for various sequential tests across settings with varying proportions of streams with nonzero means. A test rejects when its corresponding wealth process exceeds $1/\alpha$ for $\alpha = 0.01$. The dashed vertical line is the empirical mean of the stopping times. Bottom: Trajectories of various wealth processes across settings with different amounts of nonzero means. Each line represents the median trajectory of a wealth process over 1,000 simulations, with shaded areas indicating the 25% and 75% quantiles. The y-axis is presented on a logarithmic scale. Wealth processes are clipped to $10^{-3}$ for visualization purposes.
...and 6 more figures

Theorems & Definitions (35)

Definition 3.1: Level-$\alpha$ Sequential Test
Definition 3.2: Stopping Time
Theorem 3.1: Ville's Inequality ville1939etude
Proposition 4.1: chugg2023auditing
Theorem 4.1
Theorem 4.2: Stopping time of $\phi^{\textsf{ftrl}}$
Theorem 5.1: Stopping time of $\phi^{\textsf{prod}}$
Theorem 5.2: Stopping time of $\phi^{\textsf{ave}}$
Theorem 5.3: Stopping time of $\phi^{\textsf{balance}}$
proof
...and 25 more

Global Sequential Testing for Multi-Stream Auditing

TL;DR

Abstract

Global Sequential Testing for Multi-Stream Auditing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (35)