Table of Contents
Fetching ...

Optimal Dorfman Group Testing for Symmetric Distributions

Nicholas C. Landolfi, Sanjay Lall

Abstract

We study Dorfman's classical group testing protocol in a novel setting where individual specimen statuses are modeled as exchangeable random variables. We are motivated by infectious disease screening. In that case, specimens which arrive together for testing often originate from the same community and so their statuses may exhibit positive correlation. Dorfman's protocol screens a population of n specimens for a binary trait by partitioning it into non-overlapping groups, testing these, and only individually retesting the specimens of each positive group. The partition is chosen to minimize the expected number of tests under a probabilistic model of specimen statuses. We relax the typical assumption that these are independent and identically distributed and instead model them as exchangeable random variables. In this case, their joint distribution is symmetric in the sense that it is invariant under permutations. We give a characterization of such distributions in terms of a function q where q(h) is the marginal probability that any group of size h tests negative. We use this interpretable representation to show that the set partitioning problem arising in Dorfman's protocol can be reduced to an integer partitioning problem and efficiently solved. We apply these tools to an empirical dataset from the COVID-19 pandemic. The methodology helps explain the unexpectedly high empirical efficiency reported by the original investigators.

Optimal Dorfman Group Testing for Symmetric Distributions

Abstract

We study Dorfman's classical group testing protocol in a novel setting where individual specimen statuses are modeled as exchangeable random variables. We are motivated by infectious disease screening. In that case, specimens which arrive together for testing often originate from the same community and so their statuses may exhibit positive correlation. Dorfman's protocol screens a population of n specimens for a binary trait by partitioning it into non-overlapping groups, testing these, and only individually retesting the specimens of each positive group. The partition is chosen to minimize the expected number of tests under a probabilistic model of specimen statuses. We relax the typical assumption that these are independent and identically distributed and instead model them as exchangeable random variables. In this case, their joint distribution is symmetric in the sense that it is invariant under permutations. We give a characterization of such distributions in terms of a function q where q(h) is the marginal probability that any group of size h tests negative. We use this interpretable representation to show that the set partitioning problem arising in Dorfman's protocol can be reduced to an integer partitioning problem and efficiently solved. We apply these tools to an empirical dataset from the COVID-19 pandemic. The methodology helps explain the unexpectedly high empirical efficiency reported by the original investigators.
Paper Structure (79 sections, 10 theorems, 19 equations, 2 figures)

This paper contains 79 sections, 10 theorems, 19 equations, 2 figures.

Key Result

Proposition 3.2

\newlabelproposition:finitefinetti0 Suppose $p$ is a distribution on $\{0,1\}^P$. Then $p$ is symmetric if and only if there exists a function $\alpha: \{0, \dots, n\} \to [0,1]$ such that $\sum_{k = 0}^{n} \alpha(i) = 1$ and $p(x) = \sum_{k = 0}^{n} \alpha(k) r_k(x)$ for all $x \in \{0,1\}^P$.

Figures (2)

  • Figure 1: Assumptions for Dorfman's two-stage adaptive group testing procedure with noiseless binary test results. (a) drops the independence assumption whereas (b) drops the identically distributed assumption.
  • Figure 1: Comparison of an independent and identically distributed (IID) model with a symmetric model for a population of size 80. (a) The representation $\alpha$ of these distributions where $\alpha(k)$ is the probability of seeing $k$ positive specimens (see \ref{['proposition:finitefinetti']}). The IID model decays more rapidly. The symmetric distribution has non-monotonic decay; e.g., it assigns more mass to five positives than four positives. (b) The representation $q$ where $q(h)$ is the probability that a group of size $h$ tests negative (see \ref{['theorem:zeromarginals']}). The IID model underestimates these probabilities. (c) The function $U$ where $U(h)$ is the expected number of tests used on a group of size $h$ (see \ref{['lemma:Etestsgroup2']}). The IID model overestimates these costs.

Theorems & Definitions (13)

  • Remark 3.1
  • Proposition 3.2
  • Lemma 3.3
  • Theorem 3.4
  • Proof 1
  • Proposition 3.5
  • Lemma 4.1
  • Lemma 4.3
  • Lemma 4.4
  • Lemma 4.6
  • ...and 3 more