Non-Adaptive Multi-Stage Algorithm and Bounds for Group Testing with Prior Statistics

Ayelet C. Portnoy; Amit Solomon; Alejandro Cohen

Non-Adaptive Multi-Stage Algorithm and Bounds for Group Testing with Prior Statistics

Ayelet C. Portnoy, Amit Solomon, Alejandro Cohen

TL;DR

The paper tackles non-adaptive group testing with general correlated priors by introducing a two-stage Multi-Stage GT (MSGT) framework that first reduces the search space and then applies a List Viterbi Algorithm (LVA) to generate candidate population trajectories under trellis-based priors. A MAP-based final selection binds the approach to near-MAP performance while maintaining feasible computational complexity, with analytical bounds showing test-reduction potential and a sufficiency bound for MAP decoders under arbitrary correlations. The authors demonstrate that MSGT can achieve MAP-like recovery with substantial test reductions (at least 25% in practical COVID-19 and sparse-signal regimes) and provide complexity guarantees, making it scalable to moderately large N and sparse K. They further develop a Gilbert-Elliott model to evaluate the impact of general Markov priors on the MAP bound and validate the framework through extensive numerical experiments, including regimes where ML/MAP decoders are computationally infeasible. Overall, the work integrates trellis-based priors, DND/DD preprocessing, and a parallel LVA to deliver an efficient, principled GT algorithm with provable bounds and practical relevance for disease detection and sparse signal recovery.

Abstract

In this paper, we propose an efficient multi-stage algorithm for non-adaptive Group Testing (GT) with general correlated prior statistics. The proposed solution can be applied to any correlated statistical prior represented in trellis, e.g., finite state machines and Markov processes. We introduce a variation of List Viterbi Algorithm (LVA) to enable accurate recovery using much fewer tests than objectives, which efficiently gains from the correlated prior statistics structure. We also provide a sufficiency bound to the number of pooled tests required by any Maximum A Posteriori (MAP) decoder with an arbitrary correlation between infected items. Our numerical results demonstrate that the proposed Multi-Stage GT (MSGT) algorithm can obtain the optimal MAP performance with feasible complexity in practical regimes, such as with COVID-19 and sparse signal recovery applications, and reduce in the scenarios tested the number of pooled tests by at least 25% compared to existing classical low complexity GT algorithms. Moreover, we analytically characterize the complexity of the proposed MSGT algorithm that guarantees its efficiency.

Non-Adaptive Multi-Stage Algorithm and Bounds for Group Testing with Prior Statistics

TL;DR

Abstract

Paper Structure (28 sections, 8 theorems, 67 equations, 8 figures, 6 algorithms)

This paper contains 28 sections, 8 theorems, 67 equations, 8 figures, 6 algorithms.

Introduction
Problem Formulation
Main Results
Pool-Testing Algorithm
Testing Matrix and Pooling
Recovery Process
Analytical Results
Discussion
MAP Analytical Bound for GT with General Correlated Prior Statistics
Definitions and notations
Probability of Error
Sufficiency GT MAP Bound
Gilbert Elliott calculations
Numerator of Eq.\ref{['eq:P_GE']}
Denominator of Eq.\ref{['eq:P_GE']}
...and 13 more sections

Key Result

Theorem 1

Consider a group test with a Bernoulli testing matrix with $p=\ln{2}/K$, and $T$ tests as $K\rightarrow \infty$. Let $P_{e,a}^{(DND)} \triangleq N^{-\alpha \left( 1-\ln{2}/K\right) /2}$ for $\alpha \triangleq T/K \log_2{N}$. The expected number of possibly defective items is bounded by

Figures (8)

Figure 1: For an unknown population $\mathbf{U} \in \left\{0,1\right\}^9$ with $K=2$, a random testing matrix is sampled and the test result $Y$ is calculated.
Figure 2: First stage of MSGT. (a) The first step of Stage 1, the DND algorithm, reveals 5 DND items in $\mathbf{U}$, forming $\mathcal{P}^{(DND)}$. Since items participating in negative tests must be non-defective, we mark all the participants in the two negative test results as non-defective. (b) The second step of Stage 1, the DD algorithm, outputs $\mathcal{P}^{(DD)}$ that includes a single DD item, based on the first test result, as it is the only possibly defective item participating in this test. The two other positive test results do not contribute to our knowledge here because there is more than one possibly defective item participating in them.
Figure 3: Stage 2 of MSGT. (a) All the possible transitions in the state space that we consider in the LVA step, following the insights obtained in Stage 1. These transitions aggregate to a total of 6 trajectories. (b) The two most likely trajectories returned by LVA (assuming $L=2$). Given $K=2$, the black trajectory corresponds to a valid population vector $\mathbf{U}$ with 2 defective items, while the gray trajectory indicates an invalid population with 3 defective items instead. Consequently, in the subsequent step, MSGT will extract two optional defective sets: $\left\{U_6,U_8\right\}$ and $\left\{U_7,U_8\right\}$, and will finally choose the most likely one using map estimator. (c) Comparison of Stage 2 to ML. With $T=5$, we use the first 5 rows of the testing matrix, ignoring the last test result. This leaves 3 possibly defective items, forming two potentially defective sets of size $K=2$. Using ML, one set is chosen randomly, leading to an error probability of 0.5. With $T=6$, based on the third and sixth test results, there is only one set of size $K=2$ that matches the outcome $Y$, resulting in successful decoding with the ML decoder. As shown above, MSGT's Stage 2 can successfully decode $\mathbf{U}$ with just $T=5$, as using the LVA step it narrows down to only 2 possible trajectories, and then the final estimation is selected based on the given prior information and the insights gained in Stage 1.
Figure 4: Numerical evaluation for theoretical results and bounds. The results in (a), (b), and (c) are over 1000 iterations. For ML Upper Bound (UB), $T_{ML} = (1+\epsilon)K\log_{2}N$, for any $\epsilon>0$atia2012boolean. In particular, $\epsilon=0.25$ in the results presented herein.
Figure 5: Success probability of MSGT, MAP, ML and DD over 1000 iterations. A comparison to ML and MAP is not presented in (b) and (c), as they are not feasible for populations of those sizes due to the computational complexity burden.
...and 3 more figures

Theorems & Definitions (17)

Theorem 1: Cohen et al. cohen2021multi
proof
Theorem 2
proof
Theorem 3
proof
Proposition 1
proof
Theorem 4
proof
...and 7 more

Non-Adaptive Multi-Stage Algorithm and Bounds for Group Testing with Prior Statistics

TL;DR

Abstract

Non-Adaptive Multi-Stage Algorithm and Bounds for Group Testing with Prior Statistics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (17)