Table of Contents
Fetching ...

Isolate and then Identify: Rethinking Adaptive Group Testing

Hsin-Po Wang, Venkatesan Guruswami

TL;DR

Isolate and then Identify (I@I) presents a modular adaptive GT scheme that first isolates sick individuals by partitioning the population into teams and progressively reducing to teams containing at most one sick person, followed by identifying the sick individuals within those teams using capacity-achieving codes. For a binary symmetric channel with crossover $p$ and capacity $C(Z)$, I@I achieves the leading term $(1+o(1)) k \log_2(n/k)/C(Z)$ tests plus an additive $\mathcal{O}(k \log k/(1-2p)^2)$, with decoding time $\mathcal{O}(k \log(n)^2 / C(Z)^2)$ and vanishing average false positives/false negatives; the design is modular and customizable for generic channels. The approach relies on a Poisson-based analysis of team compositions, hypothesis testing to classify teams, and the use of capacity-achieving codes (e.g., Barg-Zémor or polar codes) to identify sick individuals within exact teams. This framework offers a transparent, low-complexity pathway to near-optimal adaptive GT and can be tailored to broader channel models and multi-sick scenarios, with several open questions on optimizing channel-dependent constants and rounds.

Abstract

Group testing (GT) is the art of identifying binary signals and the marketplace for exchanging new ideas for related fields such as unique-element counting, compressed sensing, traitor tracing, and geno-typing. A GT scheme can be nonadaptive or adaptive; the latter is preferred when latency is ess of an issue. To construct adaptive GT schemes, a popular strategy is to spend the majority of tests in the first few rounds to gain as much information as possible, and uses later rounds to refine details. In this paper, we propose a transparent strategy called "isolate and then identify" (I@I). In the first few rounds, I@I divides the population into teams until every team contains at most one sick person. Then, in the last round, I@I identifies the sick person in each team. Performance-wise, I@I is the first GT scheme that achieves the optimal coefficient $1/$capacity$(Z)$ for the $k \log_2 (n/k)$ term in the number of tests when $Z$ is a generic channel corrupting the test outcomes. I@I follows a modular methodology whereby the isolating part and the identification part can be optimized separately.

Isolate and then Identify: Rethinking Adaptive Group Testing

TL;DR

Isolate and then Identify (I@I) presents a modular adaptive GT scheme that first isolates sick individuals by partitioning the population into teams and progressively reducing to teams containing at most one sick person, followed by identifying the sick individuals within those teams using capacity-achieving codes. For a binary symmetric channel with crossover and capacity , I@I achieves the leading term tests plus an additive , with decoding time and vanishing average false positives/false negatives; the design is modular and customizable for generic channels. The approach relies on a Poisson-based analysis of team compositions, hypothesis testing to classify teams, and the use of capacity-achieving codes (e.g., Barg-Zémor or polar codes) to identify sick individuals within exact teams. This framework offers a transparent, low-complexity pathway to near-optimal adaptive GT and can be tailored to broader channel models and multi-sick scenarios, with several open questions on optimizing channel-dependent constants and rounds.

Abstract

Group testing (GT) is the art of identifying binary signals and the marketplace for exchanging new ideas for related fields such as unique-element counting, compressed sensing, traitor tracing, and geno-typing. A GT scheme can be nonadaptive or adaptive; the latter is preferred when latency is ess of an issue. To construct adaptive GT schemes, a popular strategy is to spend the majority of tests in the first few rounds to gain as much information as possible, and uses later rounds to refine details. In this paper, we propose a transparent strategy called "isolate and then identify" (I@I). In the first few rounds, I@I divides the population into teams until every team contains at most one sick person. Then, in the last round, I@I identifies the sick person in each team. Performance-wise, I@I is the first GT scheme that achieves the optimal coefficient capacity for the term in the number of tests when is a generic channel corrupting the test outcomes. I@I follows a modular methodology whereby the isolating part and the identification part can be optimized separately.
Paper Structure (15 sections, 3 theorems, 8 equations, 1 figure, 2 tables)

This paper contains 15 sections, 3 theorems, 8 equations, 1 figure, 2 tables.

Key Result

Theorem 1

Let $Z$ be a BSC with crossover probability $p$ that models the noisy test outputs; let $C(Z) \coloneqq 1 + p \log_2(p) + (1 - p) \log_2(1 - p)$ be its capacity. Suppose $n/k \to \infty$. To find $k$ sick people among $n$, I@I uses $(1 + o(1)) k \log_2(n/k) / C(Z) + \mathcal{O}(k \log(k) / (1 - 2p)^

Figures (1)

  • Figure 1: The isolating part of I@I. A row is a round. Empty teams are discarded. Exact teams are kept for the last round. Twoplus teams are re-divided. In this case, $(k, k^{(1)}, k^{(2)}, k^{(3)}, k^{(4)}) = (13, 10, 7, 3, 0)$

Theorems & Definitions (3)

  • Theorem 1: main
  • Lemma 2
  • Theorem 3: generic channel