Sequential Outlier Hypothesis Testing under Universality Constraints

Jun Diao; Lin Zhou

Sequential Outlier Hypothesis Testing under Universality Constraints

Jun Diao, Lin Zhou

TL;DR

This work analyzes sequential outlier hypothesis testing when both nominal and anomalous distributions are unknown, introducing universal error-exponent and stopping-time guarantees. It derives tight large-deviation bounds for exact one-outlier and extends to multiple-outlier settings, establishing GJS-based exponents for error probability universality and Rényi-based exponents for expected-stopping-time universality. The results show sequential tests can outperform fixed-length tests in both misclassification and Bayesian exponents, while quantifying penalties when the number of outliers is unknown. The analysis leverages the method of types to provide rigorous bounds and offers practical insights for universal anomaly detection on finite alphabets, with avenues for non-asymptotic and continuous-domain extensions.

Abstract

We revisit sequential outlier hypothesis testing and derive bounds on achievable exponents when both the nominal and anomalous distributions are unknown. The task of outlier hypothesis testing is to identify the set of outliers that are generated from an anomalous distribution among all observed sequences where the rest majority are generated from a nominal distribution. In the sequential setting, one obtains a symbol from each sequence per unit time until a reliable decision could be made. For the case with exactly one outlier, our exponent bounds are tight, providing exact large deviations characterization of sequential tests and strengthening a previous result of Li, Nitinawarat and Veeravalli (2017). In particular, the average sample size of our sequential test is bounded universally under any pair of nominal and anomalous distributions and our sequential test achieves larger Bayesian exponent than the fixed-length test, which could not be guaranteed by the sequential test of Li, Nitinawarat and Veeravalli (2017). For the case with at most one outlier, we propose a threshold-based test that has bounded expected stopping time under mild conditions and we bound the exponential decay rate of error probabilities under each non-null hypothesis and the null hypothesis. Our sequential test resolves the tradeoff among the exponential decay rates of misclassification, false reject and false alarm probabilities for the fixed-length test of Zhou, Wei and Hero (TIT 2022). Finally, with a further step towards practical applications, we generalize our results to the cases of multiple outliers and show that there is a penalty in the error exponents when the number of outliers is unknown.

Sequential Outlier Hypothesis Testing under Universality Constraints

TL;DR

Abstract

Paper Structure (50 sections, 14 theorems, 167 equations, 3 figures, 2 tables)

This paper contains 50 sections, 14 theorems, 167 equations, 3 figures, 2 tables.

Introduction
Main Contributions
Other Related Works
Case of Exactly One Outlier
Problem Formulation
Existing Results for the Fixed-Length Test
Error Probability Universality Constraint
Test Design and Intuition
Main Results and Discussions
Expected Stopping Time Universality Constraint
Test Design and Intuition
Main Results and Discussions
Case of Exactly $T$ Outlier
Problem Formulation
Existing Results
...and 35 more sections

Key Result

Theorem 1

Given any pair of distributions $(P_\mathrm{N},P_\mathrm{A})\in\mathcal{P}(\mathcal{X})^2$ that are fully supported on the finite alphabet $\mathcal{X}$, for each $i\in[M]$, the achievable error exponent of the fixed-length test $\Phi_{\rm LNV}$ satisfies

Figures (3)

Figure 1: Simulated misclassification probability of our test under expected stopping time universality constraint for the case with exactly one outlier under generating distributions $(P_\mathrm{N},P_\mathrm{A})=\mathrm{Bern}(0.28,0.25)$ with $M=4$.
Figure 2: Illustration the relationship of $\mathrm{LD}_\mathcal{B}(P_\mathrm{N},P_\mathrm{A},M,T)$ in $T$ under generating distributions $(P_\mathrm{N},P_\mathrm{A})=\mathrm{Bern}(0.25,0.3)$ with $M=30$.
Figure 3: Simulated misclassification probability of our test under expected stopping time universality constraint for the case with at most one outlier under generating distributions $(P_\mathrm{N},P_\mathrm{A})=\mathrm{Bern}(0.28,0.25)$ with $M=4$ with $(\lambda_1,\lambda_2)=(0.001,0.0005)$.

Theorems & Definitions (23)

Definition 1
Definition 2
Theorem 1
Theorem 2
Theorem 3
Definition 3
Theorem 4
Theorem 5
Definition 4
Theorem 6
...and 13 more

Sequential Outlier Hypothesis Testing under Universality Constraints

TL;DR

Abstract

Sequential Outlier Hypothesis Testing under Universality Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (23)