Table of Contents
Fetching ...

A Probably Approximately Correct Analysis of Group Testing Algorithms

Sameera Bharadwaja H., Chandra R. Murthy

TL;DR

This work reframes non-adaptive group testing with random pooling as a PAC-learning problem to derive finite-sample sufficiency bounds on the number of tests $m$ needed for approximate defective-set recovery. It provides PAC-based bounds for three practical decoders—CoMa, CBP, and DD—under FP-only and FN-only error models, using a Bernoulli design and near-constant row-weight design, and it explicitly links the error tolerance $\epsilon$ and confidence $1-\delta$ to $m$ through order-wise expressions. A key contribution is the optimization of design parameters in CBP (Chernoff parameter) and the extension of the coupon collector analysis to partial collection, enabling tighter bounds than prior exact-recovery results. The paper also offers non-asymptotic insights, visualizing the testing-rate surface and contours to illustrate trade-offs between accuracy and confidence, with simulations showing tight alignment between PAC bounds and empirical performance. Collectively, the results provide practical guidance for choosing the number of tests in approximate group testing and highlight the PAC framework as a unified lens for exact and approximate recovery across randomized designs.

Abstract

We consider the problem of identifying the defectives from a population of items via a non-adaptive group testing framework with a random pooling-matrix design. We analyze the sufficient number of tests needed for approximate set identification, i.e., for identifying almost all the defective and non-defective items with high confidence. To this end, we view the group testing problem as a function learning problem and develop our analysis using the probably approximately correct (PAC) framework. Using this formulation, we derive sufficiency bounds on the number of tests for three popular binary group testing algorithms: column matching, combinatorial basis pursuit, and definite defectives. We compare the derived bounds with the existing ones in the literature for exact recovery theoretically and using simulations. Finally, we contrast the three group testing algorithms under consideration in terms of the sufficient testing rate surface and the sufficient number of tests contours across the range of the approximation and confidence levels.

A Probably Approximately Correct Analysis of Group Testing Algorithms

TL;DR

This work reframes non-adaptive group testing with random pooling as a PAC-learning problem to derive finite-sample sufficiency bounds on the number of tests needed for approximate defective-set recovery. It provides PAC-based bounds for three practical decoders—CoMa, CBP, and DD—under FP-only and FN-only error models, using a Bernoulli design and near-constant row-weight design, and it explicitly links the error tolerance and confidence to through order-wise expressions. A key contribution is the optimization of design parameters in CBP (Chernoff parameter) and the extension of the coupon collector analysis to partial collection, enabling tighter bounds than prior exact-recovery results. The paper also offers non-asymptotic insights, visualizing the testing-rate surface and contours to illustrate trade-offs between accuracy and confidence, with simulations showing tight alignment between PAC bounds and empirical performance. Collectively, the results provide practical guidance for choosing the number of tests in approximate group testing and highlight the PAC framework as a unified lens for exact and approximate recovery across randomized designs.

Abstract

We consider the problem of identifying the defectives from a population of items via a non-adaptive group testing framework with a random pooling-matrix design. We analyze the sufficient number of tests needed for approximate set identification, i.e., for identifying almost all the defective and non-defective items with high confidence. To this end, we view the group testing problem as a function learning problem and develop our analysis using the probably approximately correct (PAC) framework. Using this formulation, we derive sufficiency bounds on the number of tests for three popular binary group testing algorithms: column matching, combinatorial basis pursuit, and definite defectives. We compare the derived bounds with the existing ones in the literature for exact recovery theoretically and using simulations. Finally, we contrast the three group testing algorithms under consideration in terms of the sufficient testing rate surface and the sufficient number of tests contours across the range of the approximation and confidence levels.

Paper Structure

This paper contains 27 sections, 8 theorems, 79 equations, 15 figures, 1 table.

Key Result

Lemma 1

Let $\mathcal{D}$ be a distribution such that $\mathbb{P}_{\mathcal{D}}(a_{j} = 1) \in (0, 1),~j \in [n]$ and $a_j$s are independent. Let $\mathcal{C}$ denote the set of all $k$-literal OR-ing functions in $n$-dimensional space, where $k < n$. Let $\hat{x}: \{0,1\}^n \to \{0,1\}$ (correspondingly $\

Figures (15)

  • Figure 1: Comparison of the solution of MINLP using grid-search vs. the implicit equations for $m_S$, $p_\text{opt}$ and $g_\epsilon$ at $\delta = 0.01$ when $(n, k) \in \{(2500, 50), (10000, 200), (10000, 100)\}$ over various values of the approximation error tolerance, $\epsilon$.
  • Figure 2: Comparison of the sufficiency bound given in \ref{['eq:CBP_s_Bound_Cor_sStar']} and \ref{['eq:CBP_s_Bound_Cor_sStar_Exact']} with $n = 2500$, $k = 50$, $s = s^*$ and $c = 1/2$.
  • Figure 3: (Left) Comparison of the sufficiency bound in Chan_Jaggi_Saligrama_Agnihotri_2014 and theoretical PAC bounds \ref{['eq:CoMa_Bound']} on the testing rate; (Right) theoretical and simulated testing rates at different error tolerance values, for the CoMa algorithm.
  • Figure 4: (Left) Comparison of the sufficiency bound in Chan_Jaggi_Saligrama_Agnihotri_2014 and the theoretical PAC bounds \ref{['eq:CBP_s_Bound']} on the testing rate; (Right) theoretical and simulated testing rates at different error tolerance values, for the CBP algorithm.
  • Figure 5: (Left) Comparison of the sufficiency bound in Aldridge_Balsassini_2014 and the theoretical PAC bounds \ref{['eq:DD_Bound']} on the testing rate; (Right) theoretical and simulated testing rates at different error tolerance values, for the DD algorithm.
  • ...and 10 more figures

Theorems & Definitions (16)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Lemma 3
  • Theorem 2
  • Corollary 1
  • Lemma 4
  • Theorem 3
  • proof : Proof of Lemma \ref{['lem:defective_set_learnt_function_equivalence']}
  • proof : Proof of Lemma \ref{['lem:probability_CoMa']}
  • ...and 6 more