Table of Contents
Fetching ...

Iterative Hypothesis Pruning and Distribution-based Early Labeling for Sequential Hypothesis Testing

George Vershinin, Asaf Cohen, Omer Gurewitz

TL;DR

This work tackles active sequential hypothesis testing with a Bayesian risk objective $\delta \mathbb{E}[N] + p_e$, introducing the deterministic Φ algorithm and a clustering-enhanced variant PHI-Δ. The core idea is to iteratively prune hypotheses by selecting actions that maximize separation between TVD-clustered equivalence classes and, in PHI-Δ, to cluster close hypotheses to accelerate elimination. The authors derive finite- and asymptotic-performance guarantees, including a per-stage error bound $\le \delta/(H-1)$ and per-stage sample bounds that scale as $\frac{\log(H-1/\delta)}{d_{jk|i}(a)}$, and they establish conditions under which the Bayes risk vanishes as $\delta \to 0$ and discuss complexity. Numerical results demonstrate that PHI-Δ, which leverages hypothesis structure via TVD-based clustering, can achieve orders-of-magnitude reductions in mean sample size compared to baselines such as GJL, NJ1, and Chernoff, highlighting practical gains for multi-hypothesis sequential testing in structured settings.

Abstract

We consider the problem where an active Decision-Maker (DM) is tasked to identify the true hypothesis using as few samples as possible while maintaining accuracy. The DM collects samples according to its determined actions and knows the distributions under each hypothesis. We propose the $Φ$-$Δ$ algorithm, a deterministic and adaptive multi-stage hypothesis-elimination algorithm where the DM selects an action, applies it repeatedly, and discards hypotheses in light of its obtained samples. The DM selects actions based on maximal separation expressed by the maximal minimal Total Variation Distance (TVD) between each two possible output distributions. To further optimize the search (in terms of the mean number of samples required to separate hypotheses), close distributions (in TVD) are clustered, and the algorithm eliminates whole clusters rather than individual hypotheses. We extensively analyze our algorithm and show it is asymptotically optimal as the desired error probability approaches zero. Our analysis also includes identifying instances when the algorithm is asymptotically optimal in the number of hypotheses, bounding the mean number of samples per-stage and in total, characterizing necessary and sufficient conditions for vanishing error rates when clustering hypotheses, evaluating algorithm complexity, and discussing its optimality in finite regimes.

Iterative Hypothesis Pruning and Distribution-based Early Labeling for Sequential Hypothesis Testing

TL;DR

This work tackles active sequential hypothesis testing with a Bayesian risk objective , introducing the deterministic Φ algorithm and a clustering-enhanced variant PHI-Δ. The core idea is to iteratively prune hypotheses by selecting actions that maximize separation between TVD-clustered equivalence classes and, in PHI-Δ, to cluster close hypotheses to accelerate elimination. The authors derive finite- and asymptotic-performance guarantees, including a per-stage error bound and per-stage sample bounds that scale as , and they establish conditions under which the Bayes risk vanishes as and discuss complexity. Numerical results demonstrate that PHI-Δ, which leverages hypothesis structure via TVD-based clustering, can achieve orders-of-magnitude reductions in mean sample size compared to baselines such as GJL, NJ1, and Chernoff, highlighting practical gains for multi-hypothesis sequential testing in structured settings.

Abstract

We consider the problem where an active Decision-Maker (DM) is tasked to identify the true hypothesis using as few samples as possible while maintaining accuracy. The DM collects samples according to its determined actions and knows the distributions under each hypothesis. We propose the - algorithm, a deterministic and adaptive multi-stage hypothesis-elimination algorithm where the DM selects an action, applies it repeatedly, and discards hypotheses in light of its obtained samples. The DM selects actions based on maximal separation expressed by the maximal minimal Total Variation Distance (TVD) between each two possible output distributions. To further optimize the search (in terms of the mean number of samples required to separate hypotheses), close distributions (in TVD) are clustered, and the algorithm eliminates whole clusters rather than individual hypotheses. We extensively analyze our algorithm and show it is asymptotically optimal as the desired error probability approaches zero. Our analysis also includes identifying instances when the algorithm is asymptotically optimal in the number of hypotheses, bounding the mean number of samples per-stage and in total, characterizing necessary and sufficient conditions for vanishing error rates when clustering hypotheses, evaluating algorithm complexity, and discussing its optimality in finite regimes.

Paper Structure

This paper contains 23 sections, 13 theorems, 46 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

For any action $a$, there exists some $\eta_a \in (0, 1)$ such that (i) $\Delta\mathcal{D}_{jk|i}(a) > \eta_a$. (ii) Action $a$ can distinguish between at least two hypotheses after clustering. (iii) for any $\varepsilon_a\leq \eta_a$, the per-stage average error probability does not exceed $\frac{\

Figures (5)

  • Figure 1: System model. The DM is tasked to identify the correct hypothesis indexed by $\theta\in\mathcal{H}$. By taking action $A_n$ at time step $n$, the DM obtains a sample $X_n\sim f_\theta^{A_n}$. Note that the alphabet of $X_n$ depends on the action $A_n$.
  • Figure 2: Visualization of hypotheses clustering for $H=8$ for some action $a$ (different actions may have different clustering results). Each point corresponds to a different possible output distribution under each hypothesis (i.e., $f_i^a$). Here, hypotheses indexed $\{2, 4, 5, 8\}$ are clustered together, and are represented by $H_2$, whereas hypotheses indexed $\{0, 1, 3, 6\}$ are clustered and are represented by $H_0$. Some hypotheses can be isolated, e.g., $H_7$.
  • Figure 3: A decision tree corresponding to some action selection policy. The edge weights are the mean number of samples required to separate the hypotheses on each level.
  • Figure 4: The ABR (equation \ref{['eq: Bayes Risk']}) for $H = 32$ using Algorithm \ref{['alg: Multi-Stage SHT']} (with the same proximity parameter for all actions, $\varepsilon = 0.4$) against GJL, NJ1 (with $\tilde{\rho} = 0.8$), and the Chernoff scheme when all obtained samples follow unit variance normal distributions with randomly drawn means.
  • Figure 5: The ABR (equation \ref{['eq: Bayes Risk']}) in the same settings as in Figure \ref{['figure: avg Bayes Risk vs. delta eps norm']}, but all obtained samples follow exponential distributions with randomly drawn means rather than normal distributions.

Theorems & Definitions (13)

  • Lemma 1
  • Proposition 1: Selecting epsilon and a Representative for Exponential Distributions
  • Proposition 2: Selecting epsilon and a Representative for Unit-Variance Normal Distributions
  • Theorem 1
  • Lemma 2
  • Corollary 1: Asymptotic Optimality
  • Theorem 2
  • Theorem 3
  • Proposition 3
  • Proposition 4
  • ...and 3 more