Table of Contents
Fetching ...

Group Testing with General Correlation Using Hypergraphs

Hesam Nikpey, Saswati Sarkar, Shirin Saeedi Bidokhti

TL;DR

This work develops a unified probabilistic framework for group testing under general node-state correlations by modeling infection patterns as edges in a hypergraph with a probability mass function over edges. It introduces an adaptive greedy algorithm that updates posterior edge probabilities after each test, achieving an upper bound on the expected number of tests that scales with $H(X)$, the edge-entropy, plus the mean number of infections $\mu$; the method is shown to be near-optimal in several random-hypergraph regimes. The authors extend the framework to semi-non-adaptive and noisy settings, providing theoretical bounds and arguing that entropy is not always a tight lower bound, while $\mu$ may be a lower bound in certain regimes. They also show how the model subsumes prior independent and correlated models, recover/improve prior results, and discuss the potential for non-adaptive designs and broader correlation structures. The results highlight the fundamental role of correlation structure in reducing testing resources and offer practical guidance for adaptive, semi-adaptive, and noisy group testing in networks with complex dependencies.

Abstract

Group testing, a problem with diverse applications across multiple disciplines, traditionally assumes independence across nodes' states. Recent research, however, focuses on real-world scenarios that often involve correlations among nodes, challenging the simplifying assumptions made in existing models. In this work, we consider a comprehensive model for arbitrary statistical correlation among nodes' states. To capture and leverage these correlations effectively, we model the problem by hypergraphs, inspired by [GLS22], augmented by a probability mass function on the hyper-edges. Using this model, we first design a novel greedy adaptive algorithm capable of conducting informative tests and dynamically updating the distribution. Performance analysis provides upper bounds on the number of tests required, which depend solely on the entropy of the underlying probability distribution and the average number of infections. We demonstrate that the algorithm recovers or improves upon all previously known results for group testing settings with correlation. Additionally, we provide families of graphs where the algorithm is order-wise optimal and give examples where the algorithm or its analysis is not tight. We then generalize the proposed framework of group testing with general correlation in two directions, namely noisy group testing and semi-non-adaptive group testing. In both settings, we provide novel theoretical bounds on the number of tests required.

Group Testing with General Correlation Using Hypergraphs

TL;DR

This work develops a unified probabilistic framework for group testing under general node-state correlations by modeling infection patterns as edges in a hypergraph with a probability mass function over edges. It introduces an adaptive greedy algorithm that updates posterior edge probabilities after each test, achieving an upper bound on the expected number of tests that scales with , the edge-entropy, plus the mean number of infections ; the method is shown to be near-optimal in several random-hypergraph regimes. The authors extend the framework to semi-non-adaptive and noisy settings, providing theoretical bounds and arguing that entropy is not always a tight lower bound, while may be a lower bound in certain regimes. They also show how the model subsumes prior independent and correlated models, recover/improve prior results, and discuss the potential for non-adaptive designs and broader correlation structures. The results highlight the fundamental role of correlation structure in reducing testing resources and offer practical guidance for adaptive, semi-adaptive, and noisy group testing in networks with complex dependencies.

Abstract

Group testing, a problem with diverse applications across multiple disciplines, traditionally assumes independence across nodes' states. Recent research, however, focuses on real-world scenarios that often involve correlations among nodes, challenging the simplifying assumptions made in existing models. In this work, we consider a comprehensive model for arbitrary statistical correlation among nodes' states. To capture and leverage these correlations effectively, we model the problem by hypergraphs, inspired by [GLS22], augmented by a probability mass function on the hyper-edges. Using this model, we first design a novel greedy adaptive algorithm capable of conducting informative tests and dynamically updating the distribution. Performance analysis provides upper bounds on the number of tests required, which depend solely on the entropy of the underlying probability distribution and the average number of infections. We demonstrate that the algorithm recovers or improves upon all previously known results for group testing settings with correlation. Additionally, we provide families of graphs where the algorithm is order-wise optimal and give examples where the algorithm or its analysis is not tight. We then generalize the proposed framework of group testing with general correlation in two directions, namely noisy group testing and semi-non-adaptive group testing. In both settings, we provide novel theoretical bounds on the number of tests required.

Paper Structure

This paper contains 38 sections, 18 theorems, 32 equations, 6 figures, 1 algorithm.

Key Result

Theorem 2.1

li2014group For the case where nodes are independent, i.e. $\mathcal{D}(X) = \Pi_i [X_i p_{v_i} + (1-X_i)(1-p_{v_i})]$, for any algorithm that recovers the infection set and performs $L$ tests with probability $1-\epsilon$ we have On the other hand, there is an adaptive algorithm that finds the infected set with a probability that goes to 1 as $n\rightarrow\infty$ using $L \leq O(\mu + H(X))$ tes

Figures (6)

  • Figure 1: A hypergraph with $V = \{v_1, v_2, v_3, v_4, v_5 \}$ and $E = \{ \{1,2,3\}, \{1,5\}, \{4,5\} \}$ where $p_{\{1,2,3\}} = 0.3$, $p_{\{1,5\}} = 0.2$, and $p_{\{4,5\}} = 0.5$.
  • Figure 2: A graph with 4 nodes and 4 edges. Each edge contains 3 nodes, for example, $e_2 = \{v_1, v_3, v_4\}$
  • Figure 3: An example with $k = 4$ islands. Each edge contains either all nodes in an island or none of it. There are therefore $2^4 - 1 = 15$ edges. Three edges are shown in red. Nodes in an edge can be a proper subset of nodes in another edge, eg, one of the edges marked in red consists of islands $2, 3, 4$, another consists only island $4.$ Nodes in island 1 are infected with $p_1 = \epsilon \approx 0$, nodes in islands 2 and 3 are infected with probability $p_2 = p_3 \approx 1/2$, and nodes in island 4 are infected with high probability $p_4 = 1-\gamma \approx 1$.
  • Figure 4: An illustration of $E(S)$. Here, $E(S) = \{ e_1, e_2\}$ but $e_3 \notin E(S)$ as one of its endpoints is outside of $S$. Hence, $w(S) = p_{e_1} + p_{e_2}.$ Test $V \setminus S$ is positive iff $e^* = e_3$ or $e^* = e_4$.
  • Figure 5: An instance of the graph from Example \ref{['ex:nested']} with $n = 9$ nodes.
  • ...and 1 more figures

Theorems & Definitions (54)

  • Theorem 2.1
  • Theorem 2.2
  • Remark 2.3
  • Definition 3.1
  • Definition 3.2
  • Example 3.3
  • Example 3.4
  • Remark 3.5
  • Example 3.6
  • Remark 3.7
  • ...and 44 more