A Probably Approximately Correct Analysis of Group Testing Algorithms
Sameera Bharadwaja H., Chandra R. Murthy
TL;DR
This work reframes non-adaptive group testing with random pooling as a PAC-learning problem to derive finite-sample sufficiency bounds on the number of tests $m$ needed for approximate defective-set recovery. It provides PAC-based bounds for three practical decoders—CoMa, CBP, and DD—under FP-only and FN-only error models, using a Bernoulli design and near-constant row-weight design, and it explicitly links the error tolerance $\epsilon$ and confidence $1-\delta$ to $m$ through order-wise expressions. A key contribution is the optimization of design parameters in CBP (Chernoff parameter) and the extension of the coupon collector analysis to partial collection, enabling tighter bounds than prior exact-recovery results. The paper also offers non-asymptotic insights, visualizing the testing-rate surface and contours to illustrate trade-offs between accuracy and confidence, with simulations showing tight alignment between PAC bounds and empirical performance. Collectively, the results provide practical guidance for choosing the number of tests in approximate group testing and highlight the PAC framework as a unified lens for exact and approximate recovery across randomized designs.
Abstract
We consider the problem of identifying the defectives from a population of items via a non-adaptive group testing framework with a random pooling-matrix design. We analyze the sufficient number of tests needed for approximate set identification, i.e., for identifying almost all the defective and non-defective items with high confidence. To this end, we view the group testing problem as a function learning problem and develop our analysis using the probably approximately correct (PAC) framework. Using this formulation, we derive sufficiency bounds on the number of tests for three popular binary group testing algorithms: column matching, combinatorial basis pursuit, and definite defectives. We compare the derived bounds with the existing ones in the literature for exact recovery theoretically and using simulations. Finally, we contrast the three group testing algorithms under consideration in terms of the sufficient testing rate surface and the sufficient number of tests contours across the range of the approximation and confidence levels.
